Unlocking GPU Metrics in Kubernetes with SkyPilot

SkyPilot now supports detailed GPU metrics across multiple Kubernetes clusters in the dashboard for better observability.

Rohan Sonecha·Sep 12, 2025·3 min read

From 1 hour to 10 minutes: How I sped up my distributed LLM training without changing the code or GPUs

Henry Zhu·Sep 11, 2025·8 min read

Scaling AI Infrastructure at Abridge with SkyPilot

How Abridge transformed their fragmented multi-cloud AI infrastructure into a unified system with SkyPilot, achieving 10x faster development cycles.

Sisil Mehta (ML Platform Lead, Abridge)·Sep 4, 2025·7 min read

From SLURM to SkyPilot: How Avataar cut costs 11x with multi-cloud AI infrastructure

Avataar's enterprise AI content platform cut costs 11x and unlocked GPU capacity by migrating from inflexible SLURM deployment to SkyPilot's multi-cloud infrastructure.

Shubham Jain (AI & Engineering Lead, Avataar)·Aug 20, 2025·7 min read

Self-host open-source LLM agent sandbox on your own cloud

Alex Kim·Aug 12, 2025·9 min read