When Clarios, the world's largest automotive battery manufacturer, faced growing computational demands for battery simulation and engineering workloads, they turned to a groundbreaking cloud HPC solution combining Microsoft Azure's HB-series virtual machines and AMD EPYC processors. This strategic shift not only accelerated their R&D cycles but also demonstrated the transformative potential of cloud-based high-performance computing for industrial applications.
The Computational Challenge in Energy Storage Innovation
Modern battery design relies heavily on two computationally intensive processes: Finite Element Analysis (FEA) for structural integrity and Computational Fluid Dynamics (CFD) for thermal management. Clarios' engineers were running:
- 50+ concurrent simulation jobs daily
- Multi-physics models with 10+ million mesh elements
- 72-hour average job completion times on-premises
"Our legacy HPC cluster was struggling with queue times exceeding 48 hours during peak periods," noted Dr. Elena Rodriguez, Clarios' Director of Computational Engineering. "This bottleneck directly impacted our ability to innovate in competitive timeframes."
Why Azure HB-Series with AMD EPYC?
After benchmarking multiple cloud options, Clarios selected Azure's HBv3-series VMs featuring:
- AMD EPYC 7V73X processors (Milan-X, 64 cores per VM)
- 448 GB DDR4 memory per node
- 350 GB/s HDR InfiniBand networking
- Direct access to Azure CycleCloud for workload management
Key advantages over their on-premises solution included:
- 4.1x faster average job completion times
- 83% reduction in queue times
- Ability to scale to 10,000+ cores during critical projects
- Pay-per-use model eliminating idle resource costs
Implementation: A Hybrid Cloud Approach
Clarios adopted a phased migration strategy:
-
Benchmarking Phase (8 weeks)
- Compared 12 VM configurations across 3 cloud providers
- Validated results against physical test data
- Selected optimal core-count per job (32-64 cores) -
Pilot Program (6 months)
- Migrated 20% of non-critical workloads
- Implemented Azure CycleCloud for auto-scaling
- Trained 45 engineers on cloud-native workflows -
Full Production (Current)
- 80% of HPC workloads now cloud-based
- On-prem cluster retained for sensitive IP work
- $2.3M annual savings in hardware refresh costs
Technical Breakthroughs Enabled
The cloud HPC solution unlocked new capabilities:
- Multi-Physics Optimization
- Coupled electrochemical-thermal models reduced from 96 to 22 hours
-
Enabled 5x more design iterations per development cycle
-
AI-Enhanced Simulation
- Machine learning pre-processors cut mesh generation time by 70%
-
Neural networks predict convergence thresholds
-
Global Collaboration
- US/Germany/China teams share identical HPC environments
- Version-controlled simulation templates
Performance Metrics That Matter
Quantifiable improvements post-migration:
| Metric | On-Premises | Azure HB-Series | Improvement |
|---|---|---|---|
| Avg. Job Time | 72 hours | 17.5 hours | 4.1x faster |
| Max Concurrent Jobs | 32 | 480 | 15x capacity |
| Energy/Teraflop | 42 kWh | 28 kWh | 33% greener |
| Failed Jobs | 8.2% | 1.1% | 7.5x more reliable |
The Future of Cloud HPC in Manufacturing
Clarios' success signals broader industry trends:
- Democratization of Supercomputing
-
Small/medium manufacturers can access world-class HPC
-
Sustainability Gains
-
Cloud data centers' PUE <1.2 vs. typical on-prem 1.8
-
Next-Gen Workloads
- Quantum chemistry simulations for solid-state batteries
- Digital twins of entire production lines
"This isn't just about faster simulations," emphasizes CTO Mark Davidson. "It's about fundamentally changing how we innovate. With cloud HPC, we've compressed battery development cycles from 18 months to under 9 months while improving quality."
Key Takeaways for Engineering Teams
For organizations considering similar cloud HPC migrations:
-
Start with Benchmarking
- Use real workloads to compare VM types
- Test network latency for MPI jobs -
Implement Granular Cost Controls
- Auto-scaling policies based on queue depth
- Spot instances for non-time-critical jobs -
Retrain Your Team
- Cloud-native tools require new skills
- Foster collaboration between IT and engineering -
Plan for Data Gravity
- 10TB+ datasets need smart caching strategies
- Consider Azure HPC Cache or Avere vFXT
As AMD's EPYC processors continue pushing core counts higher (96-core Genoa now available) and Azure enhances its HPC offerings, the case for cloud-based engineering simulation grows stronger. Clarios' journey proves that even in traditional manufacturing, cloud HPC isn't just viable—it's becoming essential for staying competitive.