ARM Ltd.Primary Contact
High radix, high bandwidth, and low latency switch fabrics are key enablers for high-end servers, Ethernet routers, multiprocessor NoCs, multimedia accelerators, and manycore computers in general. Conventional switch fabrics typically consist of a crossbar to route data and a separate arbiter to configure the crossbar. This poses two hurdles for scalability:
- Routing to and from the arbiter incurs significant overhead as the number of sources and destinations grows. Also, the rapidly growing complexity of centralized arbitration with increasing crossbar radix can dominate overall delay.
- Area utilization is poor since the arbiter is logic dominated, whereas the crossbar is routing intensive. The high cost of large radix switches limits fabric scalability by requiring more stages in the data traversal path. This results in higher latency, reduced energy efficiency due to intermediate data storage, and complex routing protocols to handle inter-stage communication.
To mitigate this, we propose SWIFT(SWizzle Interconnect Fabric Topology), featuring a novel distributed arbitration scheme that reuses the data transfer bit-lines as “priority lines” for conflict resolution and locally stores the connectivity status at crosspoints. This eliminates additional routing and logic overhead to produce a compact design. A 32×32 router with 64b data buses (2048 wires) requires just 0.35mm2 in 65nm, including single cycle arbitration. This corresponds to the area required to route its 2048 input/output wires at 2× min. spacing (and no additional tracks). SWIFT achieves a bandwidth of 2.1Tb/s at 1.2V with an efficiency of 7.39Tbps/W, and is fully functional down to 530mV with peak efficiency of 36.8Tbps/W.
SWIFT reuses concepts from SRAM design to make single cycle arbitration and data transfer latency possible at high radices. It uses an area efficient thyristor-based sense amplifier enabled latch (SAEL) for fast robust single-ended bit line evaluation. It supports four priorities for fairness during conflict resolution and the ability to multicast. Hence, SWIFT is 1.9× more energy efficient with 53% more bisection bandwidth at 80% lower latency over previously best reported single stage fabrics. SWIFT’s 2.3× smaller area, 2.1× faster speed, and 69% higher energy efficiency (at iso-performance) over traditional switches can significantly improve NoC performance/latency when used in place of conventional 5×5 node routers.
SWIFT: A 2.1Tb/s 32×32 Self-Arbitrating Manycore Interconnect Fabric
Sudhir Satpathy, Ronald Dreslinski, Tai-Chuan Ou, Dennis Sylvester, Trevor Mudge, David Blaauw, “SWIFT: A 2.1Tb/s 32×32 Self-Arbitrating Manycore Interconnect Fabric”, IEEE Symposium on VLSI Circuits (VLSI-Symp), June 2011 ©IEEE