Storage Networking

NVMe over Fabrics extends the NVMe command set across a network, and the transport you pick decides your latency, cost, and operational model

Storage teams modernizing their data center networking keep hitting the same question: if NVMe is the standard for local flash, what carries it across the network without throwing away the performance? NVMe over Fabrics (NVMe-oF) is the answer, but "NVMe-oF" is not one thing. It is a family of transports - TCP, RoCE, and Fibre Channel - with very different latency, cost, and day-2 operational profiles.

This guide defines NVMe-oF, walks through the transport options, and compares NVMe-oF against iSCSI and Fibre Channel so you can pick the fabric that matches the constraint you actually hit first.

⏱ 18 min read Engineering-led | Multi-vendor | Operations-focused

Key Takeaways

  • NVMe over Fabrics preserves the NVMe command set end to end across the network, avoiding the SCSI translation overhead that iSCSI and traditional Fibre Channel add to every I/O operation.
  • NVMe/TCP runs on standard Ethernet with no special hardware and is the lowest barrier to entry, while NVMe/RoCE delivers the lowest Ethernet latency but requires lossless fabric tuning.
  • FC-NVMe reuses existing Fibre Channel infrastructure and operational practices, making it the lowest-risk path for established FC environments to adopt NVMe semantics.
  • Transport selection should match workload latency budgets and operational capabilities rather than peak benchmark numbers - most environments need far fewer microsecond-class fabrics than architecture diagrams suggest.

The challenge with legacy storage protocols

Local NVMe SSDs deliver microsecond-class latency, but that performance gets trapped inside a single server when it rides PCIe. The moment storage and compute need to scale independently, teams reach for a network protocol, and the legacy options were built for spinning disk and the SCSI command set.

iSCSI and traditional Fibre Channel (FCP) both translate NVMe commands into SCSI on the way out and back, adding protocol overhead and CPU cost that erodes the reason you bought flash in the first place.

Common misconceptions about NVMe-oF

"NVMe-oF" gets treated as a single decision when it is really three transports with different cost and tuning profiles. Lossless RDMA fabrics promise the lowest latency but demand network configuration most teams underestimate. Existing SAN investment, team skills, and the application's actual latency budget rarely point at the same transport.

Three NVMe-oF transports, three operational models

NVMe-oF preserves the NVMe command set end to end, so the host talks NVMe natively instead of translating to SCSI. What changes between transports is how those commands move across the wire: standard TCP, RDMA over Ethernet, or native Fibre Channel frames. Each choice sets a different floor for latency and a different ceiling for operational simplicity.

NVMe/TCP: Standard Ethernet, no special hardware

NVMe/TCP runs over standard Ethernet and the existing TCP/IP stack with no special NICs or switches. This makes it the easiest to deploy and scale, troubleshoot with familiar IP tools, and route over distance. The tradeoff is higher latency and more host CPU per I/O than RDMA, though hardware offload narrows the gap.

For most environments, NVMe/TCP delivers the NVMe efficiency gains without the operational complexity of lossless fabrics. It is routable, works with existing network monitoring tools, and scales on commodity Ethernet infrastructure.

NVMe/RoCE: Lowest Ethernet latency with operational complexity

NVMe/RoCE uses RDMA over Converged Ethernet (RoCEv2) to move data directly between host and target memory, bypassing much of the kernel. This delivers the lowest Ethernet latency and lowest host CPU, but it requires RDMA-capable NICs and a lossless fabric tuned with Priority Flow Control (PFC) and Explicit Congestion Notification (ECN).

The operational burden is real. Congestion control design decides whether the fabric stays stable under load, and misconfigured PFC can create head-of-line blocking that degrades performance across the entire fabric. Teams considering RoCE need to commit to the lossless tuning and monitoring long term.

FC-NVMe: Native to your existing SAN

FC-NVMe encapsulates NVMe in Fibre Channel frames and reuses existing FC zoning, multipathing, and operational practice. It is deterministic, lossless by design through credit-based flow control, and provides a clean upgrade path if you already run FC.

The cost is a dedicated storage network, FC-specific HBAs and skills, and a bandwidth roadmap that trails Ethernet. But for established FC environments, FC-NVMe offers NVMe semantics with minimal operational risk.

How to choose a transport

Work the decision in order. Most teams over-rotate on peak benchmark numbers and under-weight the operational model they will live with for years.

Define the latency budget per workload tier

Separate the workloads that genuinely need microsecond-class tail latency (high-transaction databases, certain AI training pipelines) from the larger set that will be fine on NVMe/TCP. Most environments have far fewer of the former than the architecture diagrams suggest.

Measure your current latency distribution under realistic load, not synthetic benchmarks. A workload running comfortably at 500 microseconds p99 today does not need a fabric optimized for 50 microseconds.

Inventory the fabric and skills you already own

An established FC shop with zoning discipline and FC-NVMe-capable arrays has a low-risk path through FC-NVMe. An Ethernet-first team running spine-leaf with no RDMA experience should think hard before committing to a lossless RoCE fabric.

Skills matter more than speeds and feeds. A transport that requires new operational disciplines will fail if the team cannot sustain those practices under pressure.

Pilot under realistic contention, not a clean lab

A transport that looks great single-stream can fall apart when multiple tenants share the fabric. Test p99 latency under the I/O mix and host CPU headroom you will actually run, and watch for retransmits and queue buildup.

Pay particular attention to how each transport behaves under congestion. RoCE fabrics can deliver excellent performance when properly tuned but degrade rapidly when PFC triggers or ECN marking fails to control congestion effectively.

Price the total cost per I/O, not the port

Factor in NICs or HBAs, switches, optics, and the engineering time to tune and operate the fabric. NVMe/TCP on existing Ethernet often wins on cost per usable I/O even when RoCE wins on raw latency.

Include the operational cost of specialized skills, monitoring tools, and the time to troubleshoot fabric-specific issues. A transport that requires dedicated expertise becomes expensive quickly when that expertise is scarce.

Transport comparison framework

Each NVMe-oF transport optimizes for different constraints. Match the transport to your environment's actual bottlenecks, not theoretical performance ceilings.

NVMe/TCP: Ethernet, no special hardware

NVMe carried over standard TCP/IP on existing Ethernet. No RDMA NICs, no lossless tuning, routable over distance, troubleshot with familiar IP tooling.

Best fit: Ethernet-first teams, Kubernetes and virtualized estates, and the majority of workloads that need NVMe efficiency but not the absolute lowest tail latency.

Tradeoffs: Higher latency and more host CPU per I/O than RDMA. Hardware offload (DIF/DIX and similar) narrows the gap over time but adds NIC requirements.

NVMe/RoCE: Lowest Ethernet latency

NVMe over RDMA (RoCEv2) for direct memory-to-memory transfer, the lowest latency and host CPU available on Ethernet. Published testing shows 25G RoCE roughly matching 32GFC on IOPS and latency.

Best fit: Latency-critical tiers (high-transaction databases, certain AI and HPC pipelines) where teams can enforce a lossless, well-tuned fabric.

Tradeoffs: Requires RDMA-capable NICs and a lossless fabric with PFC and ECN. Misconfigured congestion control degrades under load, and the tuning burden is real.

FC-NVMe: Native to your SAN

NVMe encapsulated in Fibre Channel frames, reusing FC zoning, multipathing, and operational practice. Deterministic and lossless by design through credit-based flow control.

Best fit: Established FC environments running Tier-1 workloads that want NVMe semantics without abandoning a proven SAN operating model.

Tradeoffs: Dedicated storage network plus FC-specific HBAs, skills, and tooling. The FC speed roadmap (64GFC mainstream, 128GFC emerging) trails the pace of Ethernet.

Implementation considerations

Successful NVMe-oF deployments match transport selection to operational capabilities and scale incrementally rather than attempting fabric-wide migrations.

Migration and coexistence patterns

Most organizations run multiple transports simultaneously: NVMe/TCP for general workloads, RoCE islands for latency-critical tiers, and FC-NVMe for existing SAN environments. This hybrid approach lets teams optimize each workload tier without forcing a single fabric choice across the entire environment.

Plan for gradual migration rather than forklift upgrades. New workloads can adopt NVMe-oF while existing applications continue on iSCSI or SCSI-FCP until their next refresh cycle.

Monitoring and troubleshooting

Each transport requires different monitoring approaches. NVMe/TCP leverages existing IP network monitoring tools and practices. RoCE fabrics need specialized RDMA counters and PFC monitoring to detect congestion before it impacts performance. FC-NVMe extends existing FC monitoring to include NVMe-specific metrics.

Establish baseline performance metrics for each transport under normal load before issues arise. Troubleshooting NVMe-oF performance problems requires understanding both the storage and network layers.

Related Resources

FAQs

Frequently Asked Questions

What is the difference between NVMe and NVMe-oF?

NVMe is the command set and interface for talking to flash over PCIe inside a single server. NVMe over Fabrics (NVMe-oF) extends that same command set across a network so hosts can reach remote NVMe storage, letting compute and storage scale independently while keeping NVMe native end to end rather than translating to SCSI.

Is NVMe-oF faster than iSCSI?

Generally yes, because iSCSI maps NVMe commands onto SCSI and adds protocol overhead and host CPU per I/O. Vendor testing consistently shows iSCSI delivering the lowest IOPS and highest CPU utilization of the common transports. NVMe/TCP runs on the same Ethernet as iSCSI but keeps NVMe native, so it typically delivers more performance per dollar.

Do I need special hardware for NVMe-oF?

It depends on the transport. NVMe/TCP runs on standard Ethernet NICs and switches with no special hardware. NVMe/RoCE needs RDMA-capable NICs and a lossless fabric tuned with PFC and ECN. FC-NVMe needs Fibre Channel HBAs and switches. NVMe/TCP is the lowest barrier to entry.

Should I replace my Fibre Channel SAN with NVMe-oF?

Not necessarily. If you already run FC well, FC-NVMe lets you adopt NVMe semantics on your existing fabric with minimal risk. The case for moving to Ethernet (NVMe/TCP or RoCE) is strongest for greenfield builds, cost-sensitive scaling, or teams standardizing on a single network domain. Match the transport to your skills and roadmap, not to a benchmark.

How does NVMe/RoCE compare to FC-NVMe on latency?

They are close. Published industry testing found NVMe over 25G RoCE delivering roughly equivalent CPU utilization, IOPS, and latency to SCSI-FCP over 32GFC, and FC-NVMe lowers FC overhead further. The practical difference is operational: RoCE demands lossless Ethernet tuning, while FC-NVMe inherits FC's deterministic, credit-based flow control.

Need help designing your NVMe-oF fabric?

IVI's data center networking team designs and implements NVMe-oF fabrics across all three transports, matching the architecture to your workload requirements and operational capabilities.

Discuss Your Requirements