Skip to content

The DR benefit that wasn't in the business case: how AIM made cloud DR a easy add

A regional health system running multiple acute care facilities brought us into an infrastructure modernization engagement with two explicit goals: exit VMware licensing ahead of a renewal that had become financially untenable, and modernize compute and storage to a supportable, high-performance foundation. Cloud migration wasn't on the agenda. The health system had significant on-premises requirements that aren't going anywhere: EHR systems with latency constraints to clinical endpoints, PACS storage at scale, and a compliance posture their legal and privacy teams weren't prepared to revisit for a cloud-first conversation.

What nobody put in the business case was what the network overlay would unlock. The Arista CloudEOS infrastructure we built to connect the AIM environment to AWS turned out to be the single most valuable component of the engagement from a DR standpoint. Once that overlay existed, AWS Elastic Disaster Recovery required almost no incremental networking work. The secondary data center conversation started shortly after.

Traditional DR Creates Operational Overhead

The health system's existing DR posture followed the traditional playbook: a secondary colocation site with a partially synchronized replica of the primary environment. Different hardware generation, different storage platform, two separate operational runbooks, annual failover tests that consistently surfaced gaps, and the carrying costs of two complete infrastructure stacks.

The deeper issue was coupling. Because the secondary site mirrored the three-tier architecture of the primary, every infrastructure decision at the primary had a shadow decision at DR. When the VMware licensing problem forced a serious look at compute modernization, the immediate question was whether the secondary site needed to be modernized in parallel. The answer they were anticipating was yes, doubling the project scope and cost. What they found was different.

Purpose-Built Network Overlay Changes the DR Equation

The AIM (Aegis Infrastructure Modernization) architecture we deployed combines Nutanix AHV running on Cisco UCS X-Series compute with Pure Storage FlashArray as the external storage layer. The design follows the jointly validated FlashStack with Nutanix reference architecture, which lets compute and storage scale independently. For workloads that couldn't immediately migrate off VMware, AWS VMware Cloud on AWS provides vSphere running on EC2 bare metal within the same AWS VPC, preserving the VMware operational model while eliminating on-premises hardware.

The key component for DR was the network overlay. The health system's AIM environment connects to AWS via Arista CloudEOS, running the same EOS operating system as the on-premises Arista fabric. CloudEOS instances in AWS speak the same routing protocols, carry the same policy constructs, and are managed through the same CloudVision plane as the physical switches in the data center. The result is a single, coherent network fabric that spans on-premises and AWS with consistent segmentation, observability, and routing.

With that fabric in place, activating AWS Elastic Disaster Recovery was a software operation. AWS DRS works by installing a lightweight agent on the servers to be protected. The agent continuously replicates block-level data to a staging area in AWS. When a failover is needed, recovery instances spin up in the VPC, using the network configuration CloudEOS has already established. There's no DR-specific networking to build, no secondary fabric to configure, no routing to reconcile. The VPC already knows how to reach the on-premises EHR systems, PACS endpoints, and internal services because CloudEOS already handles that routing.

For VMware workloads running on AWS VMware Cloud on AWS, the DR story is even simpler: VMware Cloud on AWS environments are already in AWS, so the recovery target and the production environment share the same network fabric from day one.

Secondary Site Decommissioned, Operational Overhead Eliminated

The health system moved forward with decommissioning their secondary colocation site. The carrying costs of the secondary site (hardware, colocation fees, licensing, and the staff time to maintain two parallel environments) were significant. AWS DRS running against the AIM environment replaced those costs with a continuous replication model that runs against actual production systems. Annual failover tests were replaced with scheduled, non-disruptive recovery drills that can be run against isolated recovery instances without impacting production.

Pure Evergreen on the FlashArray changed the storage equation further: controller refreshes are included in the subscription, which eliminates the forced hardware refresh cycles that used to drive the secondary site refresh schedule. The health system no longer has to plan a DR hardware refresh alongside every primary refresh.

When This Applies

Any health system currently operating a secondary DR data center on aging hardware that is also evaluating infrastructure modernization should run the numbers on this sequence. The network overlay that makes the AIM environment cloud-connected is not a large incremental cost relative to the overall engagement. But it fundamentally changes the DR math: instead of maintaining two infrastructure stacks, you maintain one modern stack and leverage AWS for DR capacity that scales with your RPO and RTO requirements.

Where this doesn't apply: health systems with regulatory requirements prohibiting PHI replication to public cloud, or those with data residency constraints that preclude AWS as a DR target. Those environments have a different conversation about private cloud DR, and the AIM architecture still applies for the on-premises modernization piece.

FAQ

Do we need to fully migrate off VMware before AIM can work?

No. AWS VMware Cloud on AWS is the bridge. Workloads that aren't ready to migrate to AHV run on vSphere in AWS VMware Cloud on AWS within the same AWS environment, managed through the same operational model your VMware team already knows. AIM lets you modernize at the pace your workloads support.

What does AWS Elastic Disaster Recovery cost relative to a secondary data center?

AWS DRS charges per replicated server per hour, plus the storage cost of the staging area. Recovery instances only consume EC2 costs when a drill or actual failover is in progress. For most health systems we've modeled, the total DRS cost is materially lower than the carrying cost of a secondary colocation site, even before accounting for the hardware refresh cycles that the secondary site required.

Can we run failover drills without impacting production?

Yes. AWS DRS supports isolated recovery drills where recovery instances spin up in a quarantined subnet, fully functional for testing, with no routing to production systems. Drills can be run on demand, not just annually, which means your actual recovery confidence is substantially higher than with a once-a-year test against a secondary site.