The build-out of Artificial Intelligence / Machine Learning (AI/ML) data centers is increasing capital intensity and changing how all customers build data centers. Networking for back-end and front-end networks plays a crucial role in maximizing the capabilities of these new data centers. Data Center CAPEX will likely double as AI/ML becomes a mainstream technology, ultimately changing most industries as they adopt it.
Remote Direct Memory Access (RDMA) is a technology that allows two servers to read and/or write to each other’s memory without going through either the server’s processor, cache, or operating system. RDMA can also run over Ethernet networks. RDMA allows GPUs to run at a high utilization rate and reduce Job Completion Time (JCT). By improving efficiency, RDMA lowers the cost of ownership and allows faster training times, which is a key metric in many of the foundational training efforts at Microsoft, Chat GPT, Meta, and others.
Back-end networks focus solely on the AI/ML cluster to connect each server. For AI/ML, back-end networks are incremental to the existing infrastructure in traditional data centers. Regarding InfiniBand and Ethernet technologies, the RDMA switching opportunity should grow over 20X between 2021 and 2028. RDMA Switching, which has grown from a $1B market to one that will exceed $18B in 2028, shows the importance of technology as a critical enabler of AI/ML.
At the same time, the server and NIC continue to change. The number of GPUs per server will likely continue to increase from 8 today to 16-32 in future server designs. The increase in GPUs and training model scale will increase the total memory used. Billion parameter models will quickly become trillion parameter models. NICs will need to increase in speed and add additional capabilities to keep pace. The NIC market should add $4B to the market size in 2028. This ultimately leads to back-end bandwidth growing at over a 100% CAGR compared to traditional data center bandwidth in the 30-40% range.
We are excited to dive into further details in our RDMA Networking and AI white paper. Please read and download the white paper from the 650 Group website to learn more. The white paper includes an introduction to RDMA, RoCE, and market sizing of this fast-growth segment in AI Networking.