The accelerator is fabricated in TSMC 7nm process and runs at 800 MHz.
Meta has presented its Training and Inference Accelerator MTIA v1 – a new chip for AI workloads. The company decided that GPUs were not always optimal for running its recommendation workloads, so the accelerator was designed to help with that.
"In 2020, we designed the first-generation MTIA ASIC for Meta’s internal workloads. This inference accelerator is a part of a co-designed full-stack solution that includes silicon, PyTorch, and the recommendation models. The accelerator is fabricated in TSMC 7nm process and runs at 800 MHz, providing 102.4 TOPS at INT8 precision and 51.2 TFLOPS at FP16 precision. It has a thermal design power (TDP) of 25 W."
MTIA has a dedicated control subsystem that runs the system’s firmware, which manages available compute and memory resources, communicates with the host through a dedicated host interface, and orchestrates job execution on the accelerator.
According to Meta, the MTIA software stack aims to provide developer efficiency and high performance. It integrates fully with PyTorch and thus benefits from its developer ecosystem and tooling. As part of the stack, the company developed a library of highly optimized kernels for performance-critical ML kernels, such as fully connected and embedding-bag operators.
Meta showed the results of its first accelerator. While MTIA handles low-complexity and medium-complexity models more efficiently than GPU, the difference is not that huge. Moreover, Meta admits that it hasn't optimized MTIA for high-complexity models.
But it's just the first version, so the company expects to show better results in the future. For now, it is focused on balancing between compute power, memory bandwidth, and interconnect bandwidth.
Find out more here and don't forget to join our 80 Level Talent platform and our Telegram channel, follow us on Instagram and Twitter, where we share breakdowns, the latest news, awesome artworks, and more.