7 Key Updates from the NVIDIA-Google Cloud Partnership for Next-Gen AI Infrastructure

For over a decade, NVIDIA and Google Cloud have been co-engineering a full-stack AI platform that spans from optimized libraries to enterprise cloud services. Their latest collaboration, unveiled at Google Cloud Next in Las Vegas, marks a significant leap forward in bringing agentic and physical AI from the lab to production. From next-generation GPU instances to enhanced AI services, these innovations are designed to power everything from complex autonomous agents to factory-floor digital twins. Below are seven key announcements that highlight how this partnership is shaping the future of AI infrastructure.

1. Next-Gen Infrastructure with NVIDIA Vera Rubin A5X Instances

Google Cloud introduced A5X bare-metal instances powered by NVIDIA Vera Rubin NVL72 rack-scale systems. Through extreme codesign across chips, systems, and software, these instances deliver up to 10x lower inference cost per token and 10x higher token throughput per megawatt compared to the previous generation. The A5X uses NVIDIA ConnectX-9 SuperNICs combined with next-generation Google Virgo networking. This setup can scale to 80,000 Rubin GPUs in a single-site cluster and up to 960,000 GPUs across a multisite cluster, enabling enterprises to run massive AI workloads on NVIDIA-optimized infrastructure.

7 Key Updates from the NVIDIA-Google Cloud Partnership for Next-Gen AI Infrastructure — Source: blogs.nvidia.com

2. Google Gemini on Distributed Cloud with Blackwell and Blackwell Ultra GPUs

In a preview, Google Gemini is now running on Google Distributed Cloud powered by NVIDIA Blackwell and Blackwell Ultra GPUs. This brings advanced AI capabilities closer to data sources, reducing latency for real-time applications. The integration taps into the full performance of Blackwell's architecture, enabling faster training and inference for frontier models. Enterprises can now deploy Gemini models on-premises or at the edge while leveraging the security and scalability of Google Distributed Cloud.

3. Confidential VMs with NVIDIA Blackwell GPUs

Google Cloud announced confidential virtual machines equipped with NVIDIA Blackwell GPUs, providing hardware-based encryption for sensitive AI workloads. These VMs ensure that data remains protected throughout training and inference, even against cloud provider access. This is critical for industries like healthcare, finance, and government where data privacy and compliance are paramount. By combining NVIDIA's GPU memory encryption with Google's confidential computing environment, customers can run AI models on sensitive data without exposing it.

4. Agentic AI on Gemini Enterprise with Nemotron and NeMo

The partnership brings agentic AI to the Google Gemini Enterprise Agent Platform using NVIDIA Nemotron open models and the NVIDIA NeMo framework. This enables developers to build sophisticated agents that manage complex workflows, from customer service to supply chain optimization. The integrated stack simplifies orchestration, fine-tuning, and deployment of AI agents, allowing them to interact with enterprise data and systems in real time. This combination is designed to take agentic AI out of research and into production environments.

5. Broad NVIDIA Blackwell Portfolio: From A4 to Fractional G4 VMs

Google Cloud's NVIDIA Blackwell portfolio now spans a wide range of instances to match diverse workload needs. It includes A4 VMs with HGX B200 systems, A4X VMs with GB200 NVL72 and GB300 NVL72 rack-scale systems, and fractional G4 VMs with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs. This flexibility lets customers right-size their acceleration: from a single rack with 72 GPUs using fifth-generation NVLink and NVLink 5 Switch, to tens of thousands of GPUs across interconnected NVL72 racks. Even one-eighth of a GPU is available for smaller workloads.

6. Full-Stack AI Platform Optimization

The collaboration goes beyond hardware. NVIDIA NeMo and other optimized libraries are deeply integrated into Google Cloud's AI services, providing developers with pre-built tools for model training, tuning, and deployment. The full-stack platform spans every technology layer—from performance-optimized libraries to enterprise-grade cloud services. This integration ensures that AI workloads run efficiently on the underlying NVIDIA hardware, reducing development time and improving performance. The partnership supports both open-source and proprietary models, giving customers flexibility.

7. Sustainability and Performance at Scale

With the A5X instances delivering up to 10x higher token throughput per megawatt, this infrastructure is also designed with sustainability in mind. Mark Lohmeyer, VP of AI and Computing Infrastructure at Google Cloud, emphasized that customers can optimize for performance, cost, and sustainability simultaneously. The extreme codesign between NVIDIA's chips and Google's data center technologies minimizes energy consumption while maximizing throughput. This makes it feasible to run large-scale agentic and physical AI workloads—like factory-floor robots and digital twins—in an environmentally responsible manner.

The NVIDIA and Google Cloud partnership continues to push boundaries by delivering cutting-edge infrastructure and AI services. From the groundbreaking A5X instances to confidential computing and agentic AI platforms, these innovations empower developers and enterprises to build the next generation of intelligent systems. Whether you're training frontier models or deploying autonomous agents, this integrated ecosystem provides the performance, flexibility, and scale needed to succeed in the era of agentic and physical AI.