Key Takeaways
- Private connectivity is the first gate, not an afterthought - Bedrock and Azure OpenAI both support private network access so inference traffic never traverses the public internet.
- Identity is the real security boundary for GenAI, especially for agents - traditional IAM was not built for autonomous agents that chain actions across systems in seconds.
- Egress control is what stops data exfiltration through the model - control which destinations the workload can reach using VPC endpoint policies and a data perimeter.
- GenAI cost and behavior are invisible without purpose-built observability - token usage, not request count, drives both cost and failure modes.
Why GenAI Projects Stall on the Way to Production
A GenAI proof of concept is straightforward: point an SDK at a public model endpoint and demonstrate the use case. Production is hard, because the proof of concept skipped every piece of infrastructure that a security, network, or finance review will demand. The gap between a working demo and a deployable workload is almost entirely infrastructure, and teams discover it late, after the use case has been promised and the timeline set.
The four gaps that block production deployment
There is no egress control, so prompts, retrieved documents, and tool outputs can reach destinations nobody approved, turning the model into a potential data-exfiltration path. Cost and behavior are opaque: spend shows up as an aggregated service charge with no per-application attribution, and when an agent misbehaves or a bill spikes, there is no trace data to explain why.
Five Readiness Areas to Put in Place Before You Deploy
Treat these five areas as gates. Each applies across Bedrock, Azure OpenAI, and self-hosted models, and each should be designed before the workload goes to production, not retrofitted after a failed review.
Private Connectivity and Network Design
Keep inference traffic off the public internet. Use AWS PrivateLink interface endpoints for Bedrock or Azure private endpoints for Azure OpenAI, and design the VPC or VNet topology, including DNS resolution, before deployment. Centralize endpoints in a hub-and-spoke pattern when you have more than a couple of AI workload VPCs.
Plan for the endpoint count: a single Bedrock plus Knowledge Bases workload typically needs four to six interface endpoints, which adds up across availability zones and argues for a hub-and-spoke design. As of February 2026, Bedrock extended PrivateLink to its OpenAI-compatible endpoint, which closes a real migration gap.
Identity and Least-Privilege Access
Make identity the boundary. Issue short-lived, scoped credentials to apps and agents, enforce just-in-time and least-privilege access, maintain an inventory of every agent and machine identity, and require human approval for high-risk actions. Integrate with your existing identity provider rather than building a parallel auth system.
Traditional IAM was not built for autonomous agents that chain actions across systems in seconds. The practical difference is that identity, not the network, becomes the real boundary. Over-privileged or leaked agent credentials are the failure mode to design against.
Data-Path Security and Egress Control
Control where data can go, not just who can get in. Apply VPC endpoint policies and a data perimeter so the workload can only reach approved destinations, encrypt prompts and context with managed keys, and define what retrieved data the model is allowed to see. Egress control is the defense against exfiltration through the model.
The risk is not only inbound. Control which destinations the workload can reach using VPC endpoint policies and a data perimeter, so prompts and retrieved context cannot leave to an unapproved endpoint.
Observability for GenAI
Instrument for how GenAI actually fails. Adopt the OpenTelemetry GenAI semantic conventions to capture model, tool, and agent spans plus latency and token-usage metrics. Trace agent tool calls end to end, because a well-formed but wrong answer or a redundant tool loop will not show up as an error in traditional monitoring.
GenAI fails differently than traditional services. A traditional service returns a success code or throws an error, but an agent can produce a well-formed answer that is semantically wrong, or loop through redundant tool calls, and neither shows up as a failure in conventional monitoring.
Cost Visibility and Attribution
Make spend measurable before it scales. Token usage drives cost, and output tokens are usually the larger and less predictable share. Tag and separate workloads by application and team for per-workload attribution, set anomaly alerts, and connect usage telemetry to spend so a bill spike has an explanation, not just a number.
Token usage, not request count, drives both cost and failure modes. Attribute spend per application and per team, and trace agent tool calls, because agents fail in ways that look like success.
How to Sequence the Readiness Work
Work the areas in dependency order. Connectivity and identity are foundational; security, observability, and cost visibility layer on top. Each step is platform-neutral, with platform-specific choices made inside it.
Design connectivity and identity first
Settle the network topology (private endpoints, DNS, hub-and-spoke) and the identity model (scoped tokens, least privilege, agent identity inventory) before any workload moves toward production. These two decisions constrain everything downstream and are the most expensive to retrofit.
Layer in data-path security and egress control
With connectivity and identity in place, define the data perimeter: endpoint policies, encryption keys, and approved egress destinations. Validate that prompts, retrieved context, and tool outputs cannot reach an unapproved endpoint, and that sensitive data the model sees is governed.
Instrument observability and cost before scale
Stand up GenAI tracing on the OpenTelemetry GenAI conventions and wire token-usage metrics to per-application cost attribution while the workload is still small. Establish baselines and anomaly alerts now, so the data exists the first time an agent misbehaves or spend jumps.
What a Readiness Review Produces
A comprehensive readiness review produces four key deliverables that form the foundation for production GenAI deployment. Each deliverable addresses a critical infrastructure area and provides the documentation and policies needed to pass security, compliance, and financial reviews.
Network and Connectivity Design
A documented private-connectivity topology with endpoint inventory, DNS resolution plan, and a hub-and-spoke decision for multi-VPC environments. This includes the specific PrivateLink interface endpoints for Bedrock or private endpoints for Azure OpenAI, with DNS configuration and routing policies.
Identity and Access Model
A least-privilege access model for apps and agents, including scoped-token issuance, just-in-time policy, agent identity inventory, and human-in-the-loop rules for high-risk actions. This model integrates with existing identity providers and defines the boundaries for autonomous agent operation.
Data Perimeter and Egress Policy
Endpoint policies, encryption-key plan, and an approved-destination egress policy that closes the exfiltration path through the model. This includes VPC endpoint policies that restrict which destinations the GenAI workload can reach and encryption strategies for prompts and retrieved context.
Observability and Cost Baseline
GenAI tracing instrumented on OpenTelemetry conventions, token-usage metrics tied to per-application cost attribution, and anomaly alerting with established baselines. This foundation enables teams to understand agent behavior, attribute costs accurately, and detect anomalies before they become expensive problems.