Intel upgrades for AI Wars with Habana-designed server chip
Intel has rolled out its latest server chip designed from the ground up to tackle deep learning, as the semiconductor giant steps up its efforts to outflank NVIDIA in a booming data center market.
The Santa Clara, Calif.-based company says Gaudi2 is twice as fast as its first-generation predecessor, with improved memory and networking features that set it apart from rival graphics processors from NVIDIA. Manufactured by TSMC on the 7nm node from 16nm before, the server chip is specifically designed to train AI models to understand human speech and classify objects in images, among other capabilities.
Gaudi2 is the second-generation chip designed by Habana Labs, the startup Intel acquired for around $2 billion in 2019, aiming to expand its portfolio of data center-friendly AI chips.
Intel’s AI lawsuits
The Habana-designed chips put more pressure on NVIDIA, which sells graphics processing units (GPUs) that are currently the gold standard for machine learning training in data centers. Dominance in the AI market has helped NVIDIA overtake Intel as the most valuable chip company in the US
Habana’s Gaudi chips are just one pillar of Intel’s broader strategy to gain ground over NVIDIA in the AI chip market, said Sandra Rivera, senior vice president and general manager of data center and from Intel’s AI group.
Intel dominates the global market for supplying central processing units, the brains behind cloud servers and data centers, and it has upgraded its Xeon server processors with AI acceleration. But it is also betting on building a suite of chips including its Altera FPGAs, Habana AI accelerators and Arc GPUs that can be intertwined to boost performance and reduce power consumption across a wide range of loads. of AI work.
“AI is the engine of the data center,” said Eitan Medina, COO of Habana. “Where Habana fits is when the customer wants to use a server primarily for deep learning computing.”
Powering up Tensor Therapy
Medina said that while Gaudi2 can tackle AI training faster and with better power efficiency than its previous chips, it competes with NVIDIA’s data center GPUs on another metric: price. .
Last year, Amazon Web Services (AWS) rolled out a first-generation Gaudi-based cloud service, and it claimed customers would get up to 40% better value for money than cloud-based instances. on NVIDIA GPUs. That, in turn, could give AWS the ability to lower prices for customers who lease its servers.
Intel said Gaudi2 brings a significant performance boost to its battle with NVIDIA for AI chip leadership. The Gaudi2 is based on the same heterogeneous architecture as its predecessor. But upgrading the 16 node to 7nm allowed Intel to integrate more compute engines, cache, and networking features. “We use it to upgrade all major subsystems inside the accelerator,” Medina said.
Habana’s Gaudi 2 includes 24 computational engines called Tensor Processing Cores (TPCs) that are based on its Very Long Instruction Word (VLIW) architecture and programmable in C and C++ using a compiler program.
Other enhancements include the ability for TPC cores and Habana’s Gaudi2 Matrix Multiplication Engine (MME) to run AI workloads using smaller data units, including floating-point format. 8 bit (FP8).
Intel plans to deploy Gaudi2, also known as HL-2080, on accelerator boards based on the standard OAM form factor.
Along with offering faster processing speeds, Intel also said it has upgraded Gaudi2’s networking features.
The Gaudi2 integrates 24 Ethernet ports directly on the fabric, each running up to 100 Gb/s of RoCE—RDMA over Converged Ethernet—up from 10 100 GbE ports in its first generation.
Integrating RoCE ports into the processor itself gives customers the ability to scale to systems with thousands of Gaudi2 chips without having to connect separate network cards, or NICs, to each server in the data center. Intel said it also opens the door for customers to choose Ethernet switches and other networking gear from a wide variety of vendors. This, in turn, helps reduce system-level costs.
Most of the Ethernet ports are used to communicate with the other Gaudi2 processors on the server. The rest provides 2.4 TB/s of network throughput to other Gaudi2 servers in the data center or cluster.
“It brings several benefits,” Medina said. “Reducing the number of components in the system reduces the TCO for the end customer. Another is the ability to use Ethernet as a scaling interface for clusters.”
He added, “This means end customers can avoid using a proprietary interface,” such as NVIDIA’s NVLink GPU Interconnect, “which will essentially lock them into a particular technology.”
Gaudi2’s memory subsystem was also augmented. According to Intel, the Gaudi2 packs 48MB of SRAM, twice the amount of on-chip memory of its first-generation Gaudi.
The chip uses TSMC’s advanced packaging technology to stack 96GB of HBM2e directly on the chip package to keep more data at the processor’s gate, up from 32GB in its previous generation. Memory bandwidth has increased from 1 TB/s to 2.45 TB/s, Intel said.
Need for AI speed
Improvements to the various on-chip subsystems come at the expense of higher power consumption. The Gaudi2 accelerator board has a maximum thermal design power (TDP) of 600W, up from 350W previously.
Even though it has a higher power envelope, Intel said the Gaudi2 can still be passively cooled. This means the chips are likely to consume less power and generate less heat than offerings that require liquid cooling.
Power efficiency of a server chip is a key requirement for cloud service providers and tech giants trying to control their data center operating costs. Reducing their carbon footprint is also a top priority.
Habana said the Gaudi2 accelerator board will be able to process around 5,425 frames per second on ResNet-50, a popular AI image processing model. This translates to 1.9 times the throughput of NVIDIA’s A100 GPU on the same process node and with roughly the same die area, which can process 2,930 frames per second on Resnet-50. Intel said Gaudi2 runs the model 3 times faster than its first generation Gaudi.
Habana said Gaudi2 doubled the A100’s throughput with 80GB of HBM while running the training model for BERT, an advanced AI model that uses deep learning to classify words or predict strings of words. text.
Intel also launched its Habana-designed second-generation 7nm chip for AI inference known as Greco.
Habana said its Gaudi2 and Greco accelerator boards both use a single software stack, called Synapse AI, which translates models from TensorFlow and PyTorch to run on Habana’s fast and power-efficient AI architecture.
The SynapseAI software suite supports training models on Gaudi2, Intel said. It also supports running inference on any other system, including Xeon processors from Intel, Greco, and even Gaudi2 itself.
The uphill battle remains
As Habana racks up customers like AWS with Gaudi, bridging the gap with NVIDIA will be a big challenge. Already far ahead of the competition, NVIDIA rolled out its next-generation H100 GPU last month, based on its new Hopper architecture and slated for the second half of 2022. NVIDIA said the H100 delivers 3 times the performance per watt of the ‘100. than Habana compared to Gaudi2.
Habana’s Gaudi2, NVIDIA’s H100 and other specialized AI-class processors have become a fierce battleground in the chip industry. Growing demand for faster and more efficient AI calculations has cultivated a wave of new chip startups in recent years, including Cerebras Systems, Graphcore and Tenstorrent.
But whether Intel can gain ground on NVIDIA and stay ahead of a crowded market increasingly depends on how customers react to the performance, power efficiency and cost savings of Habana’s Gaudi2.
“This deep learning acceleration architecture is fundamentally more efficient and is backed by a solid roadmap,” said Medina de Habana. According to Intel, Gaudi2 should be in the servers that will be delivered by the end of the year.