Key Takeaways
- Nutanix environments without active management accumulate drift — AOS versions fall behind supported branches, NCC checks generate unaddressed warnings, and CVMs develop health issues that aren't critical today but will be.
- Aegis Managed Nutanix provides co-managed operations where IVI owns hypervisor and storage platform operations while your team retains control over VM placement, resource allocation, and application-layer decisions.
- Under Aegis, you make one call to IVI instead of determining whether issues exist in AHV, AOS, guest OS, or underlying hardware and opening separate support cases with multiple vendors.
- The operational model requires an active Nutanix support contract with IVI acting as the operator on that contract, providing managed operations and managing vendor TAC relationships for software defects.
The operational reality of Nutanix environments
The operational reality: Nutanix environments without active management accumulate drift. AOS versions fall behind supported branches. NCC checks generate warnings that go unaddressed. CVMs develop health issues that aren't critical today but will be. Protection domain replication lags without detection until the DR test fails.
This drift happens because Nutanix operational expertise differs significantly from traditional virtualization knowledge. Your team may have spent years building VMware expertise, but AHV hypervisor operations, CVM health management, and AOS lifecycle processes require different skills and different operational patterns.
The challenge isn't that Nutanix is complex — it's that running any hyperconverged infrastructure well requires continuous attention to components your team may not have managed before: distributed storage controllers, metadata rings, replication factor compliance, and integrated lifecycle management across multiple software layers.
Operational ownership without operational overhead
Aegis Managed Nutanix provides co-managed operations for your AHV and AOS environment. IVI's engineers own the hypervisor and storage platform operations — monitoring, incident response, and lifecycle management — while your team retains control over VM placement, resource allocation, and application-layer decisions.
This isn't traditional outsourcing. Your team maintains operational visibility and approval authority over changes. IVI provides the Nutanix-specific engineering depth and 24x7 coverage your environment requires.
The co-managed model recognizes that your team understands your applications, workload requirements, and business priorities better than any external provider. IVI brings deep Nutanix platform expertise and the operational capacity to monitor, maintain, and respond to platform-layer issues around the clock.
What IVI manages in your Nutanix environment
Aegis Managed Nutanix covers three operational domains: monitoring (Aegis PM), incident response (Aegis IR), and lifecycle management (Aegis LM). Each domain addresses specific operational challenges that Nutanix environments face without dedicated platform expertise.
The scope is deliberately focused on the AHV hypervisor and AOS storage platform layers. IVI doesn't manage guest operating systems, applications running within VMs, or business-layer decisions about resource allocation. This boundary ensures clear operational responsibility while preserving your team's control over the components that directly impact your applications and users.
Monitoring (Aegis PM)
Aegis PM provides comprehensive monitoring across Nutanix cluster health, CVM operations, AHV host performance, storage operations, and replication status. This monitoring goes beyond basic Prism Central dashboards to provide proactive alerting and trend analysis that prevents issues before they impact workloads.
Nutanix Cluster Health
Cluster health score and alert aggregation via Prism Central provides the foundation for all other monitoring. IVI monitors node count versus healthy node count with immediate alerting on any node degradation, because Nutanix clusters can tolerate node failures but require immediate attention when the failure count approaches replication factor limits.
Replication Factor compliance monitoring ensures RF2 clusters maintain minimum 3 healthy nodes and RF3 clusters maintain minimum 5 healthy nodes. Metadata ring health monitoring via Prism alerts catches distributed storage issues before they impact VM performance. Automated NCC check results via custom LogicMonitor module pulling Prism Central API ensures cluster validation runs continuously, not just during maintenance windows.
CVM Health (per node)
Controller VM health monitoring covers CVM power state and service health for critical services: stargate (storage I/O), curator (data management), cassandra (metadata), and chronos (distributed coordination). CVM CPU and memory utilization monitoring includes threshold alerting before performance impact occurs, because CVM resource exhaustion affects all VMs on that node.
CVM disk health monitoring includes SMART passthrough and SSD wear indicators, providing early warning of storage device failures that could impact cluster performance or data protection.
AHV Host Health (per node)
AHV host monitoring covers CPU utilization, memory pressure, and run-queue depth to detect performance issues before they impact VM workloads. AHV service health monitoring includes libvirtd, acropolis agent, and Open vSwitch — the core services that enable VM operations and network connectivity.
Live migration event monitoring tracks in-flight migrations and migration failures, providing visibility into cluster rebalancing operations and potential performance bottlenecks. vCPU:pCPU overcommit ratio monitoring provides proactive alerting when CPU oversubscription approaches levels that could impact VM performance.
Storage Operations
Storage pool utilization monitoring with alert thresholds at 70%, 80%, and 90% provides graduated warnings as cluster capacity approaches limits. Storage container utilization per container enables capacity planning at the workload level. Individual drive failure detection per node ensures immediate response to hardware failures.
Deduplication and compression savings ratio tracking provides visibility into storage efficiency and helps identify workloads that may benefit from different storage policies or container configurations.
Replication and DR
Protection Domain health and last replication timestamp monitoring ensures DR configurations remain functional. RPO breach detection via custom monitoring provides immediate alerting when replication falls behind defined recovery point objectives. Replication bandwidth utilization tracking helps identify network bottlenecks that could impact DR performance.
Incident Response (Aegis IR)
Aegis IR provides 24x7 incident response for platform-layer issues, with clear escalation paths for issues requiring vendor support. IVI engineers resolve issues within Aegis scope directly and manage vendor escalation when required.
IVI First Call — Direct Resolution
CVM failures: service restart, node isolation, cluster rebalancing procedures that restore cluster health without impacting running VMs. AHV service failures: restart procedures, host evacuation, maintenance mode activation to isolate problematic nodes while maintaining workload availability.
Storage performance degradation: isolation to cluster, host, or storage path level to identify root cause and implement targeted remediation. VM live migration failures: root cause diagnosis and resolution, including network path validation and resource constraint identification.
Protection Domain replication failures: diagnosis, reconnection, reseeding procedures to restore DR capability. NCC check failures: immediate remediation for non-critical issues that could impact cluster health or upgrade eligibility.
IVI Manages Vendor Escalation
AOS software defects requiring Nutanix TAC involvement: IVI opens and manages TAC cases, provides technical details, and coordinates resolution. CVM failures that don't recover through standard procedures: escalation to Nutanix engineering with detailed failure analysis.
Metadata ring corruption or cluster-wide storage issues: immediate escalation with cluster state preservation for Nutanix analysis. AHV hypervisor defects requiring engineering support: coordination with Nutanix hypervisor team including log collection and reproduction steps. Prism Central platform issues: management of Nutanix support relationship for centralized management platform problems.
Explicitly Not Owned
Guest OS failures within VMs remain your team's responsibility. Application failures within VMs are outside Aegis scope. Issues above the AHV hypervisor layer — including VM configuration, application performance, and guest OS management — remain under your team's operational control.
Lifecycle Management (Aegis LM)
Aegis LM manages AOS version currency, platform updates, and security patching to keep your Nutanix environment current and supported. This includes both proactive lifecycle planning and reactive security response.
AOS Version Management
Track LTS (Long Term Support) and STS (Short Term Support) release trains to ensure appropriate version selection for your environment's stability requirements. Provide guidance on appropriate track selection for production versus development environments based on feature requirements and change tolerance.
Manage LCM catalog updates and upgrade execution including pre-upgrade validation and post-upgrade verification. Execute pre-upgrade NCC validation and post-upgrade verification to ensure cluster health throughout the upgrade process. Manage AOS-AHV version dependency chains to prevent compatibility issues that could impact cluster stability.
Platform Currency
Maintain current Foundation imaging tool versions to ensure compatibility with latest hardware and software releases. Keep NCC versions current with mandatory pre-upgrade execution to catch configuration issues before they impact upgrades. Resolve all WARN or FAIL NCC results before upgrade execution to ensure clean upgrade paths.
Security and CVE Management
Monitor Nutanix Security Advisory (NXSA) notifications for security updates affecting your environment. Track AOS and AHV CVEs with patching timelines based on CVSS scores and your organization's security policies. Maintain documented change management for compliance requirements including change records, approval workflows, and rollback procedures.
One call, not three
Traditional Nutanix operations require your team to determine whether issues exist in AHV, AOS, guest OS, or underlying hardware, then open the correct support case with the appropriate vendor. Under Aegis, you make one call to IVI.
IVI's engineers determine the issue layer, execute resolution within Aegis scope, and open and manage vendor TAC tickets when escalation is required. You don't interact with Nutanix TAC directly unless you choose to.
This requires an active Nutanix support contract. IVI acts as the operator on that contract, providing the managed operations layer and managing the support relationship. IVI resolves operational issues — service failures, cluster degradation, configuration drift — and manages vendor TAC relationships for software defects.
Aegis Managed Nutanix requires an active Nutanix support contract. IVI acts as the operator on that contract, not a replacement for vendor support.
Who this serves
Aegis Managed Nutanix is purpose-built for organizations with deployed Nutanix environments requiring ongoing operational management beyond internal team capacity. IT teams with strong strategic and application expertise but limited hypervisor operations bandwidth benefit from the co-managed model that preserves their application-layer control while providing platform expertise.
Organizations transitioning from VMware whose teams lack AHV-specific operational knowledge can leverage Aegis to bridge the expertise gap while their teams develop Nutanix skills. Organizations with compliance requirements for firmware currency, documented change management, and incident records benefit from IVI's structured operational processes.
Not appropriate for organizations with fully staffed Nutanix operations teams that don't require augmentation, or organizations expecting IVI to manage guest OS or application layers within VMs.
Key decision criteria
The appropriate Aegis engagement model depends on your team's current Nutanix expertise, coverage requirements, platform currency status, and DR configuration maturity.
1. Nutanix operational expertise assessment
Deep expertise: Consider monitoring-only Aegis PM engagement if your team has strong Nutanix operational skills but needs enhanced monitoring and alerting capabilities.
Partial expertise: Co-managed model where IVI handles gaps such as overnight coverage, upgrade execution, or specialized troubleshooting while your team maintains day-to-day operations.
Limited expertise: Full Aegis managed engagement with IVI owning day-to-day AOS/AHV operations while your team focuses on application and business requirements.
2. Coverage requirements
24x7 requirements: IVI NOC provides continuous monitoring and incident response for organizations that cannot tolerate extended outages or have global operations requiring around-the-clock coverage.
Business hours only: Business-hours co-managed model available for organizations with tolerance for after-hours response delays and primarily regional operations.
3. Platform currency assessment
Current on AOS versions: Aegis LM maintains currency going forward with regular upgrade planning and execution to keep pace with Nutanix release cycles.
Behind on versions: Aegis LM begins with version assessment and upgrade plan to current LTS, addressing any configuration drift or compatibility issues that may have accumulated.
4. DR configuration validation
Tested protection domains: Aegis PM monitors and maintains existing configurations with regular replication health checks and RPO monitoring.
Untested configurations: Aegis begins with DR configuration audit and remediation to ensure protection domains are properly configured and replication is functioning as designed.