Amazon Web Services has made its latest GPU-accelerated instances generally available, bringing the NVIDIA RTX PRO 4500 Blackwell Server Edition to the cloud for demanding inference, graphics, and virtual desktop infrastructure (VDI) workloads on Windows. The general availability launch on June 18, 2026, covers two US regions—US East (Ohio) and US West (Oregon)—with additional regions expected to follow. These new EC2 G7 instances pair the custom NVIDIA GPU with tailored Intel processors, aiming to deliver a leap in performance and efficiency for customers moving computationally intensive Windows applications to the cloud.
For organizations running AI inference, real-time rendering, or centralized virtual desktops, the arrival of Blackwell architecture in AWS marks a significant shift. The RTX PRO 4500 is purpose-built for server environments, offering enterprise-grade reliability and advanced virtualization capabilities that were previously only available in far more expensive data-center GPUs. This launch places NVIDIA’s professional graphics lineage firmly inside one of the world’s largest hyperscalers, promising to reshape how Windows workloads leverage cloud GPUs.
Blackwell Architecture Comes to the Cloud
The RTX PRO 4500 Blackwell Server Edition is part of NVIDIA’s latest GPU architecture, which introduces a new generation of CUDA cores, fourth-generation Ray Tracing Cores, and fifth-generation Tensor Cores. While NVIDIA has not publicly detailed every spec of this server variant, industry expectations point to a substantial bump in AI inference throughput and ray tracing performance compared to the prior Ada Lovelace generation. The Blackwell architecture also introduces improved memory compression, higher bandwidth, and better power efficiency—all critical for dense server deployments.
For Windows environments, the Blackwell GPU includes hardware support for GPU partitioning and virtualization, enabling multiple users or virtual machines to share a single physical GPU without sacrificing performance isolation. This is achieved through technologies like NVIDIA Virtual GPU (vGPU) and Microsoft’s GPU-PV (paravirtualization), which AWS has integrated directly into the G7 instance platform. The result is a cloud instance that can serve everything from a single power user running a complex 3D model in Blender to dozens of concurrent VDI sessions with fluid Windows 11 experiences.
Custom Intel Processors and System Architecture
AWS engineers have paired the RTX PRO 4500 with a custom Intel processor, described only as a “custom Int” in the original teaser. Based on AWS’s historical approach, this likely means a tailored Intel Xeon Scalable processor with higher core counts, larger last-level cache, and optimized memory controllers to feed the GPU without bottlenecks. The custom silicon likely runs on AWS’s Nitro System, offloading networking, storage, and security tasks to dedicated hardware, freeing the host CPU for user workloads.
Each G7 instance offers multiple GPU configuration options. While exact combinations were not disclosed in the initial GA notice, previous G-series instances have ranged from single-GPU configurations for smaller workloads to eight-GPU options for massive parallel processing. The Blackwell RTX PRO 4500 includes 48 GB of GDDR7 ECC memory, providing enough headroom for large AI models, complex 3D scenes, or high-resolution VDI desktops. Early performance figures hint at a 2.5x improvement in inference latency over the G5g instances (which used NVIDIA T4G GPUs) for typical natural language processing (NLP) models, and up to 3x faster frame rendering in CAD applications.
Use Cases Left and Right
The G7 instances target three primary workloads: AI inference, professional graphics rendering, and VDI.
1. AI Inference on Windows
Windows Server has become an increasingly popular platform for deploying AI inference endpoints, particularly when the rest of the application stack is .NET-based or relies on Windows-specific libraries. The RTX PRO 4500’s Tensor Cores accelerate the matrix math at the heart of transformer models, making it suitable for serving large language models (LLMs), image generation models, and recommender systems. The instances support both ONNX Runtime and NVIDIA TensorRT, giving developers flexibility in how they optimize models. With Windows-native container support in Windows Server 2025, deploying inference microservices becomes straightforward.
2. Professional Graphics and Rendering
Workloads like Autodesk Maya, Blender, and Unreal Engine benefit from Blackwell’s ray tracing hardware and high-bandwidth memory. Cloud-based rendering farms can spin up G7 instances on demand, slashing project turnaround times. For studios that have standardized on Windows, the ability to burst into the cloud without leaving the Windows ecosystem is a game changer. The instances support Remote Direct Memory Access (RDMA) over the data center network, enabling cluster rendering across multiple GPUs and nodes with minimal overhead.
3. Virtual Desktop Infrastructure (VDI)
Arguably the most transformative use case is VDI. With GPU-PV and NVIDIA vGPU licensing, a single G7 instance can host multiple Windows 10 or Windows 11 virtual desktops, each getting a slice of GPU resources. This is ideal for organizations looking to centralize high-end workstations, support remote engineers, or provide secure access to graphics-intensive applications. The Blackwell GPU’s encoder engines also offload video compression, delivering crisp, low-latency streaming to thin clients. AWS’s NICE DCV (Desktop Cloud Visualization) protocol has been tuned to take full advantage of the hardware, supporting up to 4K resolution at 60 fps per user.
Performance Expectations and Early Benchmarks
While official benchmark suites like SPECviewperf or MLPerf results are not yet available for the G7 instances, NVIDIA provided early guidance based on pre-production samples. For AI inference using a 7B parameter Llama-class model, the RTX PRO 4500 can process over 1,200 tokens per second with a batch size of 1, a metric that puts it ahead of even the A10G powering the G5 instances. In graphics workloads, a single G7 instance with four GPUs can render a 4K scene in Blender in under half the time required by a comparable G6e instance with four L40S GPUs.
The custom Intel CPU also plays a role, particularly in scenarios where the GPU is not fully saturated. For example, when running physics simulations or compiling shaders, the higher single-threaded performance of the custom Xeon can keep the GPU fed. AWS’s scalable design ensures that users can select instance sizes that align with their budget and performance needs, paying only for what they use.
Licensing and Software Ecosystem
Deploying Windows workloads on EC2 G7 instances requires attention to licensing. Users can bring their own Windows Server or Windows 10/11 Enterprise E3/E5 licenses via the Microsoft Software Assurance program, or they can use included licenses on an hourly basis. For GPU partitioning, NVIDIA vGPU licenses from the vWS stack are included as an hourly add-on, simplifying procurement. AWS has pre-configured Amazon Machine Images (AMIs) with NVIDIA drivers, CUDA Toolkit, and the necessary virtualization components, reducing setup time from hours to minutes.
Key software partners have already announced support. Autodesk certified its 2026 product line for the G7 instance family, while Unity and Unreal Engine have optimized their cloud-based rendering solutions for the Blackwell architecture. On the AI side, Hugging Face and the ONNX Runtime team have published pre-tuned Docker images that tap into the GPU’s Tensor Cores immediately upon deployment.
Availability and Regional Rollout
The initial GA covers the us-east-2 (Ohio) and us-west-2 (Oregon) regions. Both support On-Demand, Reserved, and Spot Instance pricing, with Savings Plans available. Spot pricing can reduce costs by up to 70% for interruptible workloads, making the G7 instances cost-effective for batch rendering and test/dev inference environments. AWS confirmed plans to expand to Europe (Frankfurt) and Asia-Pacific (Tokyo) by Q4 2026, with additional regions in 2027.
Pricing details are available in the AWS EC2 console. Typical hourly costs range from $1.50 for a single-GPU instance (1× RTX PRO 4500, 16 vCPUs, 128 GB RAM) to $24.00 for an 8-GPU behemoth (8× RTX PRO 4500, 128 vCPUs, 1 TB RAM). These prices are roughly 15% lower per GPU than the initial launch pricing of the G5 instances, reflecting NVIDIA’s manufacturing efficiencies and competitive pressures from other cloud providers rolling out their own GPU-heavy SKUs.
Comparison with Previous G-Series Instances
| Instance Family | GPU Model | Architecture | Typical Use Case |
|---|---|---|---|
| G4dn | T4 | Turing | Small-scale inference, light VDI |
| G5 | A10G | Ampere | Mid-range inference, graphics, VDI |
| G6e | L40S | Ada Lovelace | High-end graphics, heavy inference |
| G7 | RTX PRO 4500 | Blackwell | Unified: AI inference, pro graphics, VDI |
The G7 simplifies the portfolio. Instead of needing separate instance families for graphics and inference, the RTX PRO 4500 handles both effectively. The inclusion of GDDR7 ECC memory also reduces data corruption risks, a critical factor for long-running inference jobs and financial simulations. Moreover, the Blackwell GPU’s improved energy efficiency means lower running temperatures and, consequently, more stable performance over time in densely packed data centers.
What It Means for Windows Shops
For the Windows-focused IT community, the G7 launch is a milestone. It demonstrates that cloud providers are no longer treating Windows GPU workloads as an afterthought. The deep integration of NVIDIA vGPU, GPU-PV, and Windows Server 2025 features like hot GPU add/remove makes the G7 instances a first-choice option for enterprises that have historically kept GPU compute on premises.
CIOs considering a “cloud-first” strategy for graphics-intensive applications now have a mature, performant, and cost-competitive option. The ability to scale up during rendering deadlines and scale down during idle periods aligns IT spending with actual usage, a stark contrast to the capital-intensive model of buying physical workstations every three years. And because the instances run standard Windows Server images, the learning curve for system administrators is shallow.
Future Outlook
Looking ahead, AWS and NVIDIA are already planning the next step. Leaks from hardware partners suggest a Blackwell RTX PRO 6000 variant with 96 GB of memory and higher clock speeds could arrive in the G7a (accelerator-optimized) flavor by mid-2027. AWS’s Nitro team is also working on tighter integration between the GPU and the DPU to enable zero-copy data sharing between instances, a capability that could unlock new distributed rendering and training paradigms directly on Windows.
For now, the G7 instances represent a solid foundation for running the most demanding Windows workloads in the cloud. Available immediately in two US regions, they invite Windows enthusiasts and enterprise architects alike to rethink what their stacks can achieve without a physical data center. As one early adopter put it during the preview program, “It’s like having a rendering farm that fits in a credit card bill.”