AI workloads require frequent random access to shared assets like datasets, model checkpoints, and training artifacts that persist across training runs. Regular object storage is not optimized for these patterns and will generally slow you down.

SkyPilot Volumes provide high-performance persistent storage optimized for AI workloads. Volumes are bound to a specific Kubernetes cluster and can be shared across jobs within the cluster.

Under the hood, SkyPilot Volumes are backed by PersistentVolumes on Kubernetes. You can create new PVCs with SkyPilot Volumes or mount existing ones.

How volumes help your AI workloads

  1. 10-100x faster data access
    Volumes provide file-system level access optimized for the frequent, random reads that AI training needs. This eliminates the API overhead that slows down object storage.

  2. Load artifacts only once
    Your datasets and checkpoints persist across jobs. No more re-downloading gigabytes of data at the start of each training run or when a cluster fails.

  3. Remove Kubernetes complexity
    Get persistent, high-performance storage without writing PVC manifests or understanding storage classes. One SkyPilot command replaces dozens of lines of Kubernetes YAML.

Example: Sharing volume across clusters for code persistence

Here’s how to mount a persistent volume to your existing Kubernetes cluster:

1. Define a volume YAML

Use a YAML file to define your volume:

# volume.yaml
name: my-volume
type: k8s-pvc
infra: kubernetes
size: 10Gi

2. Create your Volume

Use the SkyPilot CLI to create your volume:

sky volumes apply volume.yaml

3. Mount Volume to your cluster in the task YAML

Add the volume to your cluster YAML:

# task.yaml
volumes:
  /mnt/data: my-volume

run: |
  echo "Hello, World!" > /mnt/data/hello.txt  

4. Launch your cluster

Use the SkyPilot CLI to launch your cluster with the mounted volume attached:

sky launch task.yaml

SkyPilot will now spin up your cluster and write the output of the run section to a file on your mounted volume. You can then read the data stored on your Volume from a separate cluster.

Choose your volume type

SkyPilot lets you choose your volume type to best meet your workload. This is helpful to ensure best performance and to manage costs. Besides the Persistent Volume example above, you can also use SkyPilot Volumes to mount ephemeral volumes.

Ephemeral volumes are great when you want:

  • Temporary scratch space for intermediate results
  • Automatic lifecycle management
  • Cost optimization for short-lived workloads

Both persistent and ephemeral volumes can be backed by distributed filesystems. Distributed filesystem volumes are great for:

  • Multi-GPU training at scale
  • Concurrent access from multiple pods/nodes
  • High-throughput parallel data loading across your cluster

Read the official documentation to learn about advanced volume configurations.


Get started