Google Ironwood TPU 9216 chips
Google ended the Hot Chips 2025 machine learning session with a detailed look at its newest tensor processing unit, Ironwood. First revealed at Google Cloud Next 25 in April 2025, Ironwood is Google’s first TPU (Tensor Processing Unit) designed primarily for large scale inference workloads – and it’s a whopper.

The architecture is incredible. It delivers 4,614 TFLOPs of FP8 performance – and eight stacks of HBM3e provide 192GB of memory capacity per chip and is paired with 7.3TB/s bandwidth. With 1.2TBps of I/O bandwidth, the system can scale up to 9,216 chips per pod without glue logic and reach a whopping 42.5 exaflops of performance. It absolutely trounces their previous TPUs.

Deployment is already underway at hyperscale in Google Cloud data centers, although the TPU remains an internal platform not available directly to customers.
Links: