GPU Cloud provides dedicated compute infrastructure for machine learning workloads. Use GPU clusters to train models, run inference, and process large-scale AI tasks.
What is a GPU cluster
A GPU cluster is a group of interconnected servers, each equipped with multiple high-performance GPUs. Clusters are designed for workloads that require massive parallel processing power, such as training large language models (LLMs), fine-tuning foundation models, running inference at scale, and high-performance computing (HPC) tasks.
All nodes in a cluster share the same configuration: operating system image, network settings, and storage mounts. This ensures consistent behavior across the cluster.
Cluster types
Gcore offers three types of GPU clusters:
| Type | Description | Best for |
|---|
| Bare Metal GPU | Dedicated physical servers with guaranteed resources. No virtualization overhead | Production workloads, long-running training jobs, and latency-sensitive inference |
| Spot Bare Metal GPU | Same hardware as Bare Metal, but at a reduced price (up to 50% discount). Instances can be preempted with a 24-hour notice when capacity is needed | Fault-tolerant training with checkpointing, batch processing, development, and testing |
| Virtual GPU | Virtualized GPU instances with flexible resource management. Supports flavor changes and cost optimization through shelving (powering off releases resources and stops billing) | Development environments, variable workloads, cost-sensitive projects |
Clusters can scale to hundreds of nodes. Production deployments with 250+ nodes in a single cluster are supported, limited only by regional stock availability.
Available configurations
Select a configuration based on your workload requirements:
| Configuration | GPUs | Interconnect | RAM | Storage | Use case |
|---|
| H100 with InfiniBand | 8x NVIDIA H100 80GB | 3.2 Tbit/s InfiniBand | 2TB | 8x 3.84TB NVMe | Distributed LLM training requiring high-speed inter-node communication |
| H100 (bm3-ai-ndp) | 8x NVIDIA H100 80GB | 3.2 Tbit/s InfiniBand | 2TB | 6x 3.84TB NVMe | Distributed training and latency-sensitive inference at scale |
| A100 with InfiniBand | 8x NVIDIA A100 80GB | 800 Gbit/s InfiniBand | 2TB | 8x 3.84TB NVMe | Multi-node ML training and HPC workloads |
| A100 without InfiniBand | 8x NVIDIA A100 80GB | 2x 100 Gbit/s Ethernet | 2TB | 8x 3.84TB NVMe | Single-node training, inference for large models requiring more than 48GB VRAM |
| L40S | 8x NVIDIA L40S | 2x 25 Gbit/s Ethernet | 2TB | 4x 7.68TB NVMe | Inference, fine-tuning small to medium models requiring less than 48GB VRAM |
Outbound data transfer (egress) from GPU clusters is free. Other costs are covered in GPU Cloud billing.
InfiniBand networking
InfiniBand is a high-bandwidth, low-latency interconnect for communication between cluster nodes. It is essential for distributed training and multi-node inference where frequent data synchronization is required. InfiniBand is available for both Bare Metal and Virtual GPU clusters and is configured automatically when you select a flavor with InfiniBand support.
Storage options
GPU clusters support two storage types:
| Storage type | Persistence | Performance | Use case |
|---|
| Local NVMe | Temporary (deleted with cluster) | Highest IOPS, lowest latency | Training data cache, checkpoints during training |
| File shares | Persistent (independent of cluster) | Network-attached, lower latency than object storage | Datasets, model weights, shared checkpoints |
Cluster lifecycle
Create --> Configure --> Run workloads --> Resize (optional) --> Delete
-
Create: Select region, GPU type, number of nodes, image, and network settings. See creating a Bare Metal GPU cluster or creating a Virtual GPU cluster.
-
Configure: Connect via SSH to each node, install required dependencies, and mount file shares to prepare the environment for workloads.
-
Run workloads: Execute training jobs, run inference services, process data.
-
Resize: Add or remove nodes based on demand. New nodes inherit the cluster configuration. See managing a Bare Metal GPU cluster for details.
-
Delete: Remove the cluster when no longer needed. Local storage is erased; file shares and network disks can be preserved.
GPU clusters may take 15–40 minutes to provision, and their configuration (image, network, and storage) is fixed at creation. Local NVMe storage is temporary, so critical data should be saved to persistent file shares. Spot clusters can be interrupted with a 24-hour notice, and cluster size is limited by available regional stock.
Hardware firewall support is available on servers equipped with BlueField network cards, enhancing network security for GPU clusters.