H4D VMs, now GA, deliver exceptional performance and scaling for HPC workloads

Today, we’re announcing the general availability of H4D VMs, our latest high performance computing (HPC)-optimized VM, powered by the 5th Generation AMD EPYC™ processors. H4D VMs deliver exceptional performance, scalability, and value for industries like manufacturing, health care and life sciences, weather forecasting, and electronic design automation (EDA). H4D supports orchestration via Cluster Toolkit with Slurm and via Google Kubernetes Engine (GKE). Each approach allows for near-instant deployment and scaling of demanding workloads.

For the first time, the Google Cloud CPU portfolio features a VM family with Cloud Remote Direct Memory Access (RDMA). H4D’s RDMA is on the Titanium network adapter and lets you scale single-node H4D performance to multiple nodes, accelerating large production workloads.

Faster time to solution across domains and scales

Powered by the high core density of the 5th Gen AMD EPYC CPU and Google’s innovative, low-latency Falcon hardware transport, H4D VMs enable you to iterate and discover faster than ever before.

We demonstrated H4D performance through a series of industry-standard benchmarks, showing its capabilities across diverse domains and problem sizes.

Healthcare and life sciences
For researchers in healthcare and life sciences (HCLS), H4D VMs accelerate complex molecular simulations critical to scientific discovery. Compared to our previous C2D VMs, H4D VMs deliver up to a 4.3X speedup running LAMMPs (LJ benchmark) at 96 VMs, delivering 95% parallel efficiency on 18k cores. For drug discovery, we demonstrated a 5.8X speed-up using GROMACS (water_33m) at 32 VMs delivering 72% parallel efficiency on 6k cores. H4D also delivers further scalability, which we demonstrated by running the LAMMPS LJ benchmark on 192 VMs (~37k cores) while maintaining 92% parallel efficiency (see Figure 3).

Manufacturing
For manufacturing, H4D VMs help engineers shorten design cycles, run larger simulations, and iterate faster by delivering a strong performance boost for mission-critical Computer-Aided Engineering (CAE) workflows. Compared to our previous C2D VMs when running complex Computational Fluid Dynamics (CFD) simulations, H4D VMs deliver a 4.1X speedup running Ansys Fluent (F1_RaceCar_140m benchmark) on 32 VMs with 85% parallel efficiency. When running open-source OpenFOAM (Motorbike_100m), we demonstrated a 5.2X speedup over C2D using 16 VMs and achieving superlinear parallel efficiency of 122%.

A new standard for HPC price/performance

H4D VMs are designed to deliver the best price-performance for HPC workloads on Google Cloud by pairing superior performance with flexible consumption models. H4D supports Dynamic Workload Scheduler (DWS), which adapts to your workflow with Flex Start mode for just-in-time capacity and Calendar mode for guaranteed reservations. This allows you to access compute for as low as 3 cents per core-hour without long-term commitments. The resulting performance and cost efficiencies over previous generation VMs are detailed in Figures 6 and 7.

Comprehensive HPC management

To manage and deploy large, dense clusters of H4D VMs, you can leverage Google Cloud’s Cluster Director, which offers advanced maintenance capabilities (you can sign up for the preview here) alongside the Cluster Toolkit for rapid cluster deployment via turnkey system blueprints. For job and workload management, H4D VMs integrate with Batch, Google Cloud’s fully managed, cloud-native service that handles queuing, scheduling, and resource provisioning. Additionally, there’s support for DWS, which can be used in both Calendar mode for future reservations and Flex Start mode for time-limited, on-demand usage.

What customers and partners are saying

“We were able to test the H4D platform in early access at Jump Trading, and were extremely impressed with the results. The successful testing process demonstrated that H4D offers the performance, stability, and efficiency we require for demanding, high-volume operations. We see up to 50% better price/performance compared to prior generation machines and are now accelerating integration with our critical grid workloads on Google Cloud.” – Alex Davies, Chief Technology Officer & Benjamin Stromski, HPC Linux Engineering, Jump Trading

“There lingers, especially in large-scale and compute-intensive domains, the idea that the fastest systems can only be built on premises and run on bare metal hardware. Terms such as ‘hypervisor tax” are often thrown around as justification for operating with bare metal. Our testing paints a different picture. The Google H4D VM performs better on our financial risk benchmark than the bare metal top of stack AMD CPU of the same generation.” – Hamza Mian/CEO, HMxLabs

“As a leading provider of managed HPC solutions for the demanding CAE and manufacturing sectors, our evaluation of the H4D platform was focused heavily on its ability to handle our clients’ largest, most tightly-coupled simulation workloads. We are extremely impressed with the results. The testing confirmed that the underlying RDMA fabric exhibits the outstanding low-latency and high-bandwidth performance required for massive parallel processing. This level of interconnect efficiency is non-negotiable for speeding up critical manufacturing simulations like crash testing and CFD. H4D has proven itself to be a true accelerator for high-throughput engineering workloads, and we are excited about its potential to redefine the performance ceiling for HPC in the engineering world.” – Rodney Mach/President, TotalCAE

“The new H4D instances are a significant step forward for our demanding next-generation TPU simulation workloads. We’ve seen a 30% performance improvement across a variety of EDA benchmarks compared to C2D, demonstrating the strong single core performance of H4D. This directly translates to faster development cycles and allows our engineering teams to iterate more quickly” – Trevor Switkowski, Technical Lead of Chip Design Methodology, Google Cloud

Experience H4D today

H4D is now available in us-central1-a (Iowa), europe-west4-b (Netherlands) and asia-southeast1-a (Singapore) with additional regions coming soon. Check regional availability on our Regions and Zones page and deploy your most demanding HPC workloads by leveraging Cloud RDMA.

_{The following configurations were run for the above benchmarks:LAMMPS version 20250722, GROMACS: version 2023.1, OpenFOAM version 2312, Ansys Fluent version 2024R1. All runs used IntelMPI 2021.17.2. C2D/C3D/C4D used TCP, H4D used RDMA with RXM & SAR_LIMIT=2G. All runs used full ppn (processes-per-node) available on each platform (56, 180, 192 for C2D, C3D and C4D/H4D respectively). Ansys Fluent runs used 168ppn on H4D and variable ppn for C4D. SMT off for all. Cost comparision across single nodes of H4D-highmem-192 with DWS Flex Start price, c3d-standard-360 and c2d-standard-112 OD price.}

_{Parallel efficiency and optimal node count depend on input size and communication patterns, and therefore vary across workloads.}