Spinning up cloud infrastructure for AI workloads often requires intricate framework-specific parameter tweaking. This can take up a lot of valuable time that’s better spent on actually running important compute jobs.
SkyPilot now ships predefined YAML templates for launching clusters with popular frameworks and patterns. Templates are automatically available on all new SkyPilot clusters. With a single command you can launch fully configured environments without having to write all the YAML from scratch.
Here’s how to launch a fully-configured Ray cluster:
run: |
~/sky_templates/ray/start_cluster
Access the full template in the SkyPilot repo.
🚀 Spin up Ray clusters with one line
You can now launch a Ray cluster by adding a single line to your YAML’s run block, without having to spend time on manually writing Ray boilerplate code:
run: |
~/sky_templates/ray/start_cluster
This runs a predefined start_cluster executable that will launch a Ray cluster with sensible defaults for compute resources, networking, ports, etc. based on what we’ve seen work best across clouds.
All Templates are available under ~/sky_templates on any SkyPilot cluster launched with skypilot v0.11.1+. Follow the installation instructions to get started.
🦾 Templates in action
Here’s how to use a Ray cluster to train a distributed PyTorch model in 3 simple steps:
- Download this Python script to a local directory. This defines your PyTorch model training sequence using the Fashion MNIST dataset.
- Copy this ray_train.yaml into the same directory. This defines your SkyPilot configs, including running the
start_clusterexecutable that will spin up your Ray cluster. - Launch your SkyPilot cluster 🙌 This will spin up a Ray cluster on the requested resources and run the PyTorch training sequence:
sky launch -c raycluster ray_train.yaml
Here’s what it looks like:
And here’s the step-by-step guide to run this example for yourself:
- Distributed training with Ray example in the docs.
🔧 Tweaking templates
To tweak Ray settings such as port numbers, dashboard host or other optional overrides, you can set environment variables before running start_cluster.
For example:
envs:
RAY_HEAD_PORT: 6379
RAY_DASHBOARD_HOST: 127.0.0.1
Or use the CLI flags:
sky launch --env RAY_HEAD_PORT=6379 ray_train.yaml
Note that SkyPilot already uses Ray for its own internal cluster management on port 6380. To avoid resource conflicts, the default port for launching your own Ray applications is 6379. Do not use ray.init(address="auto") as it will connect to SkyPilot’s internal cluster and cause conflicts.
👾 Build your own templates
SkyPilot Templates are a library of best-practice configurations that are ready to use out-of-the-box. We’d love to hear which templates you want to see next so we can prioritize the most popular frameworks and patterns.
You can also contribute your own templates by submitting a PR to the skypilot repo. You can find all the currently available templates on this page.