We are excited to announce SkyPilot 0.6, which brings a host of new features and improvements to SkyPilot. This release is aimed at making your AI workloads more efficient and cost-effective by allowing them to run anywhere.
In this post, we will highlight some of the key features and improvements in SkyPilot 0.6.
Managed Jobs
We’ve introduced a new Managed Jobs API to queue jobs, automatically recover from failures (e.g., spot instance preemption, GPU failures) and run pipelines.
Launch jobs and pipelines with a simple command:
sky jobs launch my-job.yaml
sky jobs queue
shows the details of the jobs, including the status, logs, and more.
$ sky jobs queue
Fetching managed job statuses...
Managed jobs:
ID NAME RESOURCES SUBMITTED TOT. DURATION JOB DURATION #RECOVERIES STATUS
2 roberta 1x [A100:8][Spot] 2 hrs ago 2h 47m 18s 2h 36m 18s 0 RUNNING
1 bert-qa 1x [V100:1][Spot] 4 hrs ago 4h 24m 26s 4h 17m 54s 0 RUNNING
To learn more, refer to the Jobs API documentation.
Mix Spot and On-demand Instances for Serving
Spot instances are great for cost savings, but they can be preempted leading to service downtime. SkyServe now supports mixing on-demand instances to ensure high availability while also using spot instances to deliver cost savings.
For example, to autoscale a service with a minimum of 2 replicas and a maximum of 3 replicas, and ensuring that one of the replicas is run on on-demand instances, you can use base_ondemand_fallback_replicas
:
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Ensures that one of the replicas is run on on-demand instances
base_ondemand_fallback_replicas: 1
Similarly, to dynamically fall back to on-demand replicas if spot instances get preempted, use dynamic_ondemand_fallback
:
service:
readiness_probe: /health
replica_policy:
min_replicas: 2
max_replicas: 3
target_qps_per_replica: 1
# Run replicas on on-demand instances if spot instances are unavailable
dynamic_ondemand_fallback: true
Read the docs for more details on mixing spot and on-demand instances.
SkyServe on Kubernetes
SkyServe now integrates with Kubernetes. Both the SkyServe controller and service replicas can be run on Kubernetes and managed by SkyPilot.
Launch your SkyServe service with:
sky serve up my-service.yaml
SkyServe can also burst out of the Kubernetes cluster to the cloud for additional capacity when needed. You get one common endpoint to run all your queries, while SkyServe manages the underlying infrastructure (Kubernetes and the clouds) for you.
Example Llama-2 service running on Kubernetes and GCP spot instances:
$ sky serve status
Services
NAME VERSION UPTIME STATUS REPLICAS ENDPOINT
llama2 1 34m 44s READY 3/3 x.x.x.x:30001
Service Replicas
SERVICE_NAME ID VERSION ENDPOINT LAUNCHED RESOURCES STATUS REGION
llama2 1 1 http://x.x.x.y:8888 1 hr ago 1x Kubernetes({'T4': 1}) READY kubernetes
llama2 2 1 http://a.b.c.d:8888 1 hr ago 1x GCP([Spot]{'T4': 1}) READY us-east4
llama2 3 1 http://x.x.x.z:8888 1 hr ago 1x Kubernetes({'T4': 1}) READY kubernetes
Paperspace support
SkyPilot now supports a total of 14 clouds, including Paperspace.
$ sky launch --gpus H100:8 --cloud paperspace
== Optimizer ==
Estimated cost: $47.6 / hour
Considered resources (1 node):
-------------------------------------------------------------------------------------------------
CLOUD INSTANCE vCPUs Mem(GB) ACCELERATORS REGION/ZONE COST ($) CHOSEN
-------------------------------------------------------------------------------------------------
Paperspace H100x8 128 640 H100:8 East Coast (NY2) 47.60 ✔
-------------------------------------------------------------------------------------------------
Launching a new cluster 'sky-2941-romilb'. Proceed? [Y/n]:
To get started, follow the installation guide.
Other features and improvements
SkyPilot 0.6 also includes several other features and improvements, including:
- New LLM Recipes: Llama-3, Qwen, Ollama, DBRX, Unsloth, Cog.
- Proxy support in SkyServe for improved performance and security.
sky show-gpus
shows realtime GPU availability on your Kubernetes cluster.$ sky show-gpus --cloud kubernetes GPU QTY_PER_NODE TOTAL_GPUS TOTAL_FREE_GPUS L4 1, 2, 3, 4 8 6 H100 1, 2 4 2
- Support for autoscaling Kubernetes clusters.
- Use service accounts for authentication with Kubernetes clusters.
- Improved GPU isolation on Kubernetes.
- Support for Python 3.11.
… and many more optimizations and features! Check out the release notes for a detailed list.
Summary
SkyPilot 0.6 makes the deployment and management of AI workloads efficient and cost-effective on any infra. With the new Managed Jobs API, SkyServe on Kubernetes, and support for Paperspace, you can now run your AI workloads anywhere with ease. We hope you enjoy this release and stay tuned for more!
To receive latest updates, please star and watch the project’s GitHub repo, follow @skypilot_org, or join the SkyPilot community Slack.