Scaling SAM3 Video Segmentation on Multiple Kubernetes clusters and Clouds with SkyPilot

Video analysis at scale is resource-intensive. Sports analytics companies process thousands of hours of footage. Security teams monitor multiple camera feeds. Content moderation systems analyze user-uploaded videos around the clock. Each of these use cases requires detecting and tracking objects across video frames - a task that was historically painful due to limited model capabilities and computational constraints.

The emergence of foundation models for video understanding has changed what’s possible. But running these models on large video archives still presents a practical challenge: how do you efficiently distribute processing across available GPU resources without spending more time on infrastructure plumbing than actual analysis?

SAM3 segmenting soccer players and ball. For demo purposes, sampled at 1 fps for faster inference.

SAM3: A foundation model for video segmentation

SAM3 (Segment Anything 3) is Meta’s latest addition to the Segment Anything family. The big change from previous versions is text-based prompting - you can now describe what you want to segment using natural language like “soccer player” or “ball” instead of clicking on objects or drawing bounding boxes. The model then detects, segments, and tracks those objects across all video frames.

SAM3 was trained on a dataset containing over 4 million annotated concepts, enabling it to handle 270,000 unique concepts - roughly 50x more than existing benchmarks. Meta reports it doubles the accuracy of previous systems on both image and video segmentation tasks.

For this example, we’ll use SAM3 to process a soccer video dataset from Kaggle, detecting and tracking players and balls throughout each video.

The single-GPU bottleneck

You could run SAM3 on a single node. Here’s what that looks like with SkyPilot (installation):

resources:
  accelerators: L40S:1

setup: |
  # Install dependencies and download dataset
  ...  

run: |
  source .venv/bin/activate
  # Process all videos sequentially
  for video in /data/videos/*.mp4; do
    python process_segmentation.py "$video"
  done

Launch it:

sky launch -c sam3-single task.yaml

This works, but the dataset contains over 100 videos. Processing them sequentially on a single node would take a long time. If you have a deadline or need results quickly, you’ll want to parallelize across multiple nodes.

Even if you scale up within a single cluster, you may still be leaving GPU capacity idle on other clusters or clouds. Ideally, you’d use all available GPUs across your infrastructure - not just the ones on whichever cluster you happened to deploy to.

Distributed batch inference with SkyPilot pools

SkyPilot’s Pools feature lets you create a fleet of GPU workers that share the same environment. You define the setup once, and SkyPilot keeps the workers warm and ready to process jobs as they come in.

Batch Inference Architecture

The key benefits for video processing workloads:

No cold starts: SAM3 model weights and the video dataset are downloaded once during pool creation, not repeated for each job
Unified job queue: Submit any number of jobs - SkyPilot distributes them across available workers automatically
Multi-cloud flexibility: Use GPUs from Kubernetes clusters, AWS, or other providers in the same pool

Setting up multi-cloud infrastructure

Most organizations have GPU capacity scattered across different providers - a Kubernetes cluster on-prem, some reserved instances on AWS, maybe capacity from a neocloud provider. SkyPilot unifies access to all of these through a single interface.

First, check what infrastructure is available:

$ sky check
...
🎉 Enabled infra 🎉
  AWS [compute, storage]
  Kubernetes [compute]
    Allowed contexts:
    ├── k8s-cluster-two
    └── k8s-cluster-one
...

In my case, I have 2 Kubernetes clusters configured (k8s-cluster-one and k8s-cluster-two) as well as AWS. You could easily add GCP, Azure, or a neocloud like Lambda Labs, Nebuis or Coreweave.

Query GPU availability on each Kubernetes cluster:

$ sky show-gpus --infra k8s/k8s-cluster-one
Kubernetes GPUs
Context: k8s-cluster-one
GPU   REQUESTABLE_QTY_PER_NODE  UTILIZATION
H100  1, 2, 4, 8                8 of 8 free
L40S  1                         2 of 2 free
Kubernetes per-node GPU availability
CONTEXT          NODE                                GPU   UTILIZATION
k8s-cluster-one  computeinstance-e00bv1fw2gdqr3qjx8  L40S  1 of 1 free
k8s-cluster-one  computeinstance-e00fbanz9gtw87fga9  H100  8 of 8 free
k8s-cluster-one  computeinstance-e00rjv8ch8n7chabbb  H100  8 of 8 free
k8s-cluster-one  computeinstance-e00yqvp56bxk9r4jt5  L40S  1 of 1 free


$ sky show-gpus --infra k8s/k8s-cluster-two
Kubernetes GPUs
Context: k8s-cluster-two
GPU   REQUESTABLE_QTY_PER_NODE  UTILIZATION
L40S  1                         3 of 3 free
Kubernetes per-node GPU availability
CONTEXT          NODE                                GPU   UTILIZATION
k8s-cluster-two  computeinstance-e00ee5zmwmbtscq4ht  L40S  1 of 1 free
k8s-cluster-two  computeinstance-e00vc7e70xjdd3xgwg  L40S  1 of 1 free
k8s-cluster-two  computeinstance-e00xp4071zprmt6vwr  L40S  1 of 1 free

We have 2 L40S GPUs on k8s-cluster-one, 3 on k8s-cluster-two, and AWS as a fallback. SkyPilot can use all of these together in a single pool.

Implementation

Pool configuration

The pool YAML defines worker infrastructure and shared setup (view on GitHub):

Click to expand: pool.yaml

pool:
  workers: 7

resources:
  accelerators: L40S:1

file_mounts:
  ~/.kaggle/kaggle.json: ~/.kaggle/kaggle.json
  /outputs:
    source: s3://my-skypilot-bucket

workdir: .

setup: |
  # Setup runs once on all workers (must be non-blocking)
  sudo apt-get update && sudo apt-get install -y unzip ffmpeg
  uv venv .venv --python 3.12
  source .venv/bin/activate
  uv pip install -r requirements.txt
  # Download soccer video dataset from Kaggle (store in S3 to avoid re-downloading)
  DATASET_PATH=/outputs/datasets/soccer-videos
  if [ ! -d "$DATASET_PATH" ]; then
    echo "Downloading dataset from Kaggle to S3..."
    mkdir -p /outputs/datasets
    kaggle datasets download shreyamainkar/football-soccer-videos-dataset --force
    unzip -q football-soccer-videos-dataset.zip -d $DATASET_PATH
    rm -f football-soccer-videos-dataset.zip
  fi
  echo "Setup complete!"

Job configuration

The job YAML defines the workload that runs on each worker (view on GitHub):

Click to expand: job.yaml

name: sam3-segmentation-job

resources:
  accelerators: L40S:1

secrets:
  HF_TOKEN: null

run: |
  source .venv/bin/activate
  uv pip install -r requirements.txt
  echo "Job rank: ${SKYPILOT_JOB_RANK}/${SKYPILOT_NUM_JOBS}"

  # Get list of all videos
  VIDEO_DIR=/outputs/datasets/soccer-videos
  mapfile -t VIDEOS < <(find ${VIDEO_DIR} -name "*.mp4" | sort)
  TOTAL_VIDEOS=${#VIDEOS[@]}
  echo "Total videos: ${TOTAL_VIDEOS}"

  # Calculate start and end indices for this job
  CHUNK_SIZE=$((TOTAL_VIDEOS / SKYPILOT_NUM_JOBS))
  REMAINDER=$((TOTAL_VIDEOS % SKYPILOT_NUM_JOBS))

  START_IDX=$((SKYPILOT_JOB_RANK * CHUNK_SIZE))
  if [ ${SKYPILOT_JOB_RANK} -lt ${REMAINDER} ]; then
    START_IDX=$((START_IDX + SKYPILOT_JOB_RANK))
    CHUNK_SIZE=$((CHUNK_SIZE + 1))
  else
    START_IDX=$((START_IDX + REMAINDER))
  fi

  END_IDX=$((START_IDX + CHUNK_SIZE))
  echo "Processing videos ${START_IDX} to ${END_IDX}"

  # Process each video in this job's chunk
  for ((i=START_IDX; i<END_IDX; i++)); do
    video="${VIDEOS[$i]}"
    echo "Processing: $video"
    python process_segmentation.py "$video" --max-frames 50 || echo "Failed: $video"
  done

  echo "Job complete! Results saved to S3 bucket."

SkyPilot provides $SKYPILOT_JOB_RANK and $SKYPILOT_NUM_JOBS environment variables. Each job calculates which slice of videos it should process, ensuring work is evenly distributed without overlap.

Processing script

The Python script handles the actual segmentation (view on GitHub):

Click to expand: process_segmentation.py (key sections)

"""SAM3 video segmentation for soccer players and ball."""

import cv2
import numpy as np
from PIL import Image
import torch
from transformers import Sam3VideoModel, Sam3VideoProcessor

PROMPTS = ["soccer player", "ball"]
PLAYER_COLOR = (255, 100, 100)
BALL_COLOR = (100, 255, 100)


def process_video(model, processor, video_path, output_dir, sample_fps=1, max_frames=0):
    """Run SAM3 segmentation on video and save results."""
    video_name = Path(video_path).stem

    frames, original_fps, output_fps = load_video_frames(video_path, sample_fps, max_frames)
    print(f"  {len(frames)} frames (sampled at {output_fps} fps from {original_fps} fps)")

    # Initialize video session with SAM3
    session = processor.init_video_session(
        video=frames,
        inference_device="cuda",
        processing_device="cpu",
        video_storage_device="cpu",
        dtype=torch.bfloat16,
    )
    session = processor.add_text_prompt(inference_session=session, text=PROMPTS)

    # Track objects through video
    masks_by_frame = {}
    with torch.no_grad():
        for out in model.propagate_in_video_iterator(
                inference_session=session, max_frame_num_to_track=len(frames)):
            processed = processor.postprocess_outputs(session, out)
            # Store masks for each frame...

    # Overlay colored masks and save video
    output_frames = []
    for i, frame in enumerate(frames):
        masks = masks_by_frame.get(i, {})
        output_frames.append(overlay_masks(frame, masks, colors) if masks else frame)

    save_video(output_frames, output_video_path, output_fps)

    # Save metadata JSON
    result = {
        "video": video_name,
        "frames_processed": len(frames),
        "objects_detected": len(obj_to_prompt),
        "players_detected": total_players,
        "balls_detected": total_balls,
    }
    return result


def main():
    print("Loading SAM3 model...")
    model = Sam3VideoModel.from_pretrained("facebook/sam3").to("cuda", dtype=torch.bfloat16).eval()
    processor = Sam3VideoProcessor.from_pretrained("facebook/sam3")
    print("Model loaded!")

    result = process_video(model, processor, video_path, output_dir, args.sample_fps, args.max_frames)

Running the pipeline

Create the pool

Spin up 7 workers across both Kubernetes clusters and AWS:

sky jobs pool apply -p sam3-pool pool.yaml

SAM3 Segmentation with SkyPilot Pools

Check pool status

Once the pool is created, verify that workers are ready:

$ sky jobs pool status sam3-pool
Pools
NAME       VERSION  UPTIME   STATUS  WORKERS
sam3-pool  1        13m 56s  READY   7/7

Pool Workers
POOL_NAME  ID  VERSION  LAUNCHED     INFRA                         RESOURCES                             STATUS            USED_BY
sam3-pool  1   1        16 mins ago  Kubernetes (k8s-cluster-two)  1x(gpus=L40S:1, cpus=4, mem=16, ...)  READY             -
sam3-pool  2   1        15 mins ago  AWS (us-east-1a)              1x(gpus=L40S:1, g6e.xlarge, ...)      READY             -
sam3-pool  3   1        15 mins ago  AWS (us-east-1a)              1x(gpus=L40S:1, g6e.xlarge, ...)      READY             -
sam3-pool  4   1        16 mins ago  Kubernetes (k8s-cluster-two)  1x(gpus=L40S:1, cpus=4, mem=16, ...)  READY             -
sam3-pool  5   1        16 mins ago  Kubernetes (k8s-cluster-two)  1x(gpus=L40S:1, cpus=4, mem=16, ...)  READY             -
sam3-pool  6   1        16 mins ago  Kubernetes (k8s-cluster-one)  1x(gpus=L40S:1, cpus=4, mem=16, ...)  READY             -
sam3-pool  7   1        16 mins ago  Kubernetes (k8s-cluster-one)  1x(gpus=L40S:1, cpus=4, mem=16, ...)  READY             -

SkyPilot filled the pool using all 5 available L40S GPUs from both Kubernetes clusters, then provisioned 2 additional workers on AWS to reach the requested 7 workers.

SkyPilot Dashboard Pool Workers

Submit batch jobs

Submit 10 jobs to process the video dataset:

sky jobs launch --pool sam3-pool --num-jobs 10 --secret HF_TOKEN job.yaml

Seven jobs start immediately (one per worker), and the remaining 3 queue up.

$ sky jobs queue
Fetching managed job statuses...
Managed jobs
In progress tasks: 3 PENDING, 7 RUNNING
ID   TASK  NAME                   REQUESTED   SUBMITTED    TOT. DURATION  JOB DURATION  #RECOVERIES  STATUS             POOL
169  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         -             0            PENDING            sam3-pool
168  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         -             0            PENDING            sam3-pool
167  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 44s        0            RUNNING            sam3-pool (worker=6)
166  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 44s        0            RUNNING            sam3-pool (worker=5)
165  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 46s        0            RUNNING            sam3-pool (worker=4)
164  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         -             0            PENDING            sam3-pool
163  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 49s        0            RUNNING            sam3-pool (worker=2)
162  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 49s        0            RUNNING            sam3-pool (worker=3)
161  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 46s        0            RUNNING            sam3-pool (worker=1)
160  -     sam3-segmentation-job  1x[L40S:1]  4 mins ago   4m 51s         4m 43s        0            RUNNING            sam3-pool (worker=7)

SkyPilot Dashboard Jobs Queue

View logs

Watch a specific job’s progress:

$ sky jobs logs 167
...
(sam3-segmentation-job, pid=3213) Model loaded!
(sam3-segmentation-job, pid=3213) Processing: 87
(sam3-segmentation-job, pid=3213)   50 frames (sampled at 1 fps from 25.0 fps)
(sam3-segmentation-job, pid=3213)   0%|          | 0/50 [00:00<?, ?it/s]
100%|██████████| 50/50 [00:48<00:00,  1.03it/s]
...

Scale up

If you need results faster, add more workers:

sky jobs pool apply --pool sam3-pool --workers 15
sky jobs launch --pool sam3-pool --num-jobs 20 job.yaml

SkyPilot will provision additional workers from available infrastructure to meet the new count.

Cleanup

When finished, tear down the pool:

sky jobs pool down sam3-pool

Results

Processed videos and metadata are synced to S3:

$ aws s3 ls s3://my-skypilot-bucket/segmentation_results/ --recursive
2025-12-22 08:53:37          0 segmentation_results/
2025-12-22 08:54:22          0 segmentation_results/1/
2025-12-22 08:54:23        231 segmentation_results/1/1_metadata.json
2025-12-22 08:54:23    3041504 segmentation_results/1/1_segmented.mp4
2025-12-22 08:55:13          0 segmentation_results/10/
2025-12-22 08:55:13        234 segmentation_results/10/10_metadata.json
2025-12-22 08:55:13    4291581 segmentation_results/10/10_segmented.mp4
...

Each video gets a segmented output with colored overlays (red for players, green for balls) and a metadata JSON with detection statistics.

How SkyPilot Pools unlocks capacity and boosts throughput

The main benefit of SkyPilot Pools is unlocking GPU capacity that would otherwise sit unused across different clusters and clouds. Here’s how throughput scales with the infrastructure in this example:

Configuration	Available GPUs	Relative throughput
Single GPU instance	1	1x
Single K8s cluster (`k8s-cluster-one`)	2 L40S	2x
Both K8s clusters	5 L40S	5x
Multi-cloud pool (K8s + AWS)	7 L40S	7x

Without SkyPilot, you’d be limited to whichever cluster has the most available GPUs - in this case, just 3 on k8s-cluster-two. With Pools, you aggregate capacity across both Kubernetes clusters and burst to AWS when needed, achieving near-linear scaling.

This pattern becomes more valuable as workloads grow. If you need to process 1000 videos instead of 100, you can scale the pool to 20+ workers across multiple regions and clouds - something that would require significant custom orchestration otherwise.

Adapting for other use cases

The same pool-based pattern works for other video processing tasks:

Change text prompts: Edit PROMPTS in process_segmentation.py for different objects:

PROMPTS = ["person", "car", "traffic light"]  # Traffic monitoring
PROMPTS = ["whale", "dolphin", "boat"]        # Marine research

Adjust frame sampling: By default, the script samples 1 frame per second. For higher-fidelity tracking:

python process_segmentation.py video.mp4 --sample-fps 5

Use different GPUs: Update the pool and job YAML files:

resources:
  accelerators: H100:1  # More VRAM for longer videos

Non-video workloads: SkyPilot Pools work for any batch processing task, not just video. See the documentation for examples like batch text classification with vLLM and document OCR with DeepSeek OCR.

Resources

SkyPilot Pools Documentation
SAM3 on Hugging Face
Complete example code - includes pool.yaml, job.yaml, process_segmentation.py
Soccer Videos Dataset

SAM3: A foundation model for video segmentation#

The single-GPU bottleneck#

Distributed batch inference with SkyPilot pools#

Setting up multi-cloud infrastructure#

Implementation#

Pool configuration#

Job configuration#

Processing script#

Running the pipeline#

Create the pool#

Check pool status#

Submit batch jobs#

View logs#

Scale up#

Cleanup#

Results#

How SkyPilot Pools unlocks capacity and boosts throughput#

Adapting for other use cases#

Resources#