Arrcus to deliver lossless connectivity fabric for next-gen distributed AI workloads

Arrcus, the hyperscale networking software company and a leader in core, edge, and multi-cloud routing and switching infrastructure, announces the enhancement of its ACE-AI platform to address the growing demand for unified networking fabric for distributed AI workloads.

As AI workloads become increasingly distributed, driven by economic considerations and application requirements, Arrcus ACE-AI is a platform designed to seamlessly network them across various locations and deliver applications at the edge, with high speed and lossless connectivity. The emerging federated learning model for AI allows multiple entities to collaboratively train a model with decentralized data. Training models may be executed in hyperscale environments while inference models may be executed at the edge for various use cases. Arrcus recognizes the need for a unified networking fabric that interconnects these workloads, regardless of where they may reside. Modern data center applications demand high throughput (400-800Gbps) and ultra-low latency (< 10μs per hop), and Arrcus ACE-AI meets these demands while minimizing CPU overhead.

“The future of AI lies in its ubiquity, and Arrcus has built the industry’s most flexible and intelligent fabric that connects and orchestrates distributed AI workloads,” said Shekar Ayyar, Chairman and CEO of Arrcus. “With the enhanced ACE-AI platform, we are giving enterprises and service providers the power to unlock the full potential of AI, across clouds, data centers, and the edge.”

Emerging artificial intelligence, high-performance computing, and storage workloads pose new challenges for large-scale datacenter networking. Arrcus addresses these challenges by supporting new features that build a lossless Ethernet fabric, including RoCEv2, PFC, ECN, ETS, AR, Dynamic Load Balancing, and Global Load Balancing.

One of the significant challenges in achieving high-performance networking for AI workloads is the limitation of traditional TCP/IP stacks at such speeds due to their high CPU overhead. Arrcus addresses this challenge by incorporating RDMA (Remote Direct Memory Access) technology, which offloads transport communication tasks from the CPU to hardware, providing direct memory access for applications without involving the CPU. The second version of RDMA over Converged Ethernet (RoCE-v2) further enhances the protocol with UDP/IP headers with routing.

In addition to these feature enhancements, Arrcus has announced support for new industry-leading platforms from Broadcom that are state-of-the-art 800G switching platforms – Tomahawk5, Jericho3, Ramon3 – in partnership with device manufacturers like Ufispace and Edgecore.

“Broadcom is very excited to collaborate with Arrcus to deliver industry-leading switching solutions that are optimized to meet the performance demands of next-generation AI workloads. Together, Arrcus and Broadcom are enabling customers to build high-performance, scalable, and intelligent data center networks,” said Ram Velaga, senior vice president and general manager, Core Switching Group, Broadcom.

Partner Resources

Popular Right Now

Edgecore Insight Podcast

Ep-1: Navigating the Waters of Sustainability

Others have also read ...