Google Cloud C4 reduces TCO 70% for GPT OSS

Google Cloud, Intel, and Hugging Face published a benchmark that might change how you think about serving large open models. The promise? Better performance and lower cost by using Google Cloud's new C4 instances with Intel Xeon 6 (Granite Rapids) processors to run GPT OSS, the open MoE variant of OpenAI. (huggingface.co)

What Intel and Hugging Face published

The article documents controlled tests comparing C4 VMs (Intel Xeon 6 GNR) against the previous C3 (4th gen Xeon SPR), using the unsloth/gpt-oss-120b-BF16 model for text generation with bfloat16 precision. The goal was to measure performance per token (decoding latency) and throughput normalized by vCPU across different batch sizes. (huggingface.co)

What Intel and Hugging Face published

What Intel and Hugging Face published

Key results and numbers

Why should this matter to you now?

How to reproduce the benchmark quickly

Limitations and open questions

Final thought

Stay up to date!

Google Cloud C4 reduces TCO 70% for GPT OSS