SkyPilot 0.6: Managed Jobs API, SkyServe on Kubernetes, Spot + On-demand mixing, Paperspace support

We are excited to announce SkyPilot 0.6, which brings a host of new features and improvements to SkyPilot. This release is aimed at making your AI workloads more efficient and cost-effective by allowing them to run anywhere.

In this post, we will highlight some of the key features and improvements in SkyPilot 0.6.

Managed Jobs

We’ve introduced a new Managed Jobs API to queue jobs, automatically recover from failures (e.g., spot instance preemption, GPU failures) and run pipelines.

Launch jobs and pipelines with a simple command:

sky jobs launch my-job.yaml

sky jobs queue shows the details of the jobs, including the status, logs, and more.

$ sky jobs queue
Fetching managed job statuses...
Managed jobs:
ID NAME     RESOURCES           SUBMITTED   TOT. DURATION   JOB DURATION   #RECOVERIES  STATUS
2  roberta  1x [A100:8][Spot]   2 hrs ago   2h 47m 18s      2h 36m 18s     0            RUNNING
1  bert-qa  1x [V100:1][Spot]   4 hrs ago   4h 24m 26s      4h 17m 54s     0            RUNNING

To learn more, refer to the Jobs API documentation.

Mix Spot and On-demand Instances for Serving

Spot instances are great for cost savings, but they can be preempted leading to service downtime. SkyServe now supports mixing on-demand instances to ensure high availability while also using spot instances to deliver cost savings.

For example, to autoscale a service with a minimum of 2 replicas and a maximum of 3 replicas, and ensuring that one of the replicas is run on on-demand instances, you can use base_ondemand_fallback_replicas:

service:
  readiness_probe: /health
  replica_policy:
    min_replicas: 2
    max_replicas: 3
    target_qps_per_replica: 1
    # Ensures that one of the replicas is run on on-demand instances
    base_ondemand_fallback_replicas: 1

Similarly, to dynamically fall back to on-demand replicas if spot instances get preempted, use dynamic_ondemand_fallback:

service:
  readiness_probe: /health
  replica_policy:
    min_replicas: 2
    max_replicas: 3
    target_qps_per_replica: 1
    # Run replicas on on-demand instances if spot instances are unavailable
    dynamic_ondemand_fallback: true

Read the docs for more details on mixing spot and on-demand instances.

SkyServe on Kubernetes

SkyServe now integrates with Kubernetes. Both the SkyServe controller and service replicas can be run on Kubernetes and managed by SkyPilot.

Launch your SkyServe service with:

sky serve up my-service.yaml

SkyServe can also burst out of the Kubernetes cluster to the cloud for additional capacity when needed. You get one common endpoint to run all your queries, while SkyServe manages the underlying infrastructure (Kubernetes and the clouds) for you.

Example Llama-2 service running on Kubernetes and GCP spot instances:

$ sky serve status
Services
NAME    VERSION  UPTIME   STATUS  REPLICAS  ENDPOINT
llama2  1        34m 44s  READY   3/3       x.x.x.x:30001

Service Replicas
SERVICE_NAME  ID  VERSION  ENDPOINT               LAUNCHED  RESOURCES                 STATUS  REGION
llama2        1   1        http://x.x.x.y:8888    1 hr ago  1x Kubernetes({'T4': 1})  READY   kubernetes
llama2        2   1        http://a.b.c.d:8888    1 hr ago  1x GCP([Spot]{'T4': 1})   READY   us-east4
llama2        3   1        http://x.x.x.z:8888    1 hr ago  1x Kubernetes({'T4': 1})  READY   kubernetes

Paperspace support

SkyPilot now supports a total of 14 clouds, including Paperspace.

$ sky launch --gpus H100:8 --cloud paperspace

== Optimizer ==
Estimated cost: $47.6 / hour

Considered resources (1 node):
-------------------------------------------------------------------------------------------------
 CLOUD        INSTANCE   vCPUs   Mem(GB)   ACCELERATORS   REGION/ZONE        COST ($)   CHOSEN
-------------------------------------------------------------------------------------------------
 Paperspace   H100x8     128     640       H100:8         East Coast (NY2)   47.60         ✔
-------------------------------------------------------------------------------------------------

Launching a new cluster 'sky-2941-romilb'. Proceed? [Y/n]:

To get started, follow the installation guide.

Other features and improvements

SkyPilot 0.6 also includes several other features and improvements, including:

New LLM Recipes: Llama-3, Qwen, Ollama, DBRX, Unsloth, Cog.
Proxy support in SkyServe for improved performance and security.

sky show-gpus shows realtime GPU availability on your Kubernetes cluster.

$ sky show-gpus --cloud kubernetes     
GPU   QTY_PER_NODE  TOTAL_GPUS  TOTAL_FREE_GPUS  
L4    1, 2, 3, 4    8           6               
H100  1, 2          4           2

Support for autoscaling Kubernetes clusters.
Use service accounts for authentication with Kubernetes clusters.
Improved GPU isolation on Kubernetes.
Support for Python 3.11.

… and many more optimizations and features! Check out the release notes for a detailed list.

Summary

SkyPilot 0.6 makes the deployment and management of AI workloads efficient and cost-effective on any infra. With the new Managed Jobs API, SkyServe on Kubernetes, and support for Paperspace, you can now run your AI workloads anywhere with ease. We hope you enjoy this release and stay tuned for more!

To receive latest updates, please star and watch the project’s GitHub repo, follow @skypilot_org, or join the SkyPilot community Slack.

Managed Jobs#

Mix Spot and On-demand Instances for Serving#

SkyServe on Kubernetes#

Paperspace support#

Other features and improvements#

Summary#