Batch Inference for Documents with DeepSeek-OCR using a Pool of Workers on any Clouds

Enterprise AI tools like Microsoft Copilot, Glean, Onyx and the like are becoming popular in organizations of all sizes. These RAG-based systems can answer questions, summarize content, and pull insights from massive document repositories.

However, they have trouble processing images and scanned documents because these types of data is often unsupported by the embedding models used in RAG pipelines (for semantic similarity search).

Many enterprises have decades of knowledge locked away in these formats. Scanned paper documents, legacy PDFs with weird layouts, technical drawings, archives from before anyone cared about “digital workflows.”

OCR

These documents often contain sensitive info - contracts, medical records, financial statements, internal reports. Shipping them off to a third-party OCR API is a no-go from a compliance perspective. HIPAA, GDPR, and internal data governance policies often mean you simply can’t send this data outside your infrastructure. Self-hosting OCR models becomes the only viable option. Traditional OCR doesn’t quite help here.

Multi-column layouts get merged into gibberish, tables lose their structure completely, and PDFs with invisible layers or annotations become unreadable. So you end up with valuable knowledge that might as well not exist, because your RAG system can’t see it.

Enter DeepSeek-OCR

DeepSeek OCR is a different beast from traditional OCR. Instead of the traditional sequential pipeline, it uses a vision-language model that understands a document as a whole - recognizing text, structure, and context simultaneously. This means multi-column layouts stay intact, tables preserve their structure, and the model outputs clean markdown that’s ready for RAG systems.

It also does context-aware text recognition. When text is illegible from ink stains or poor scanning quality, it infers the most likely word from context rather than outputting gibberish. For instance, in a damaged contract reading “The party agrees to ##### the premises by December 31st”, traditional OCR might return random characters, but DeepSeek OCR correctly infers “vacate” from the legal context.

Traditional OCR vs DeepSeek OCR Traditional OCR’s sequential five-stage pipeline - preprocessing, detection, layout analysis, recognition, and language correction - compounds errors and often loses structure. DeepSeek OCR replaces this with a unified vision-language model that processes text, structure, and context simultaneously, outputting RAG-ready markdown.

The model itself is great, but there’s a challenge: processing enterprise document archives with hundreds of thousands of pages on a single GPU would take weeks. You need a way to scale this efficiently across multiple machines.

Batch inference with SkyPilot Pools

To process large document archives efficiently, you need a scalable batch inference system. Most organizations already have GPU capacity scattered across their infrastructure - reserved instances on AWS, managed Kubernetes clusters from Neoclouds, such as Nebius and Coreweave, maybe some credits on GCP. These GPUs often sit idle between training runs or serving workloads. SkyPilot’s Pools feature lets you harness all of this capacity together, creating a unified pool of workers that spans multiple clouds and Kubernetes clusters.

With a pool of workers, you can spin off a large amount of batch inference jobs and utilize all the GPUs available from any of your infrastructure.

Batch Inference Architecture

The naive approach: single GPU processing

You could start by running OCR on a single GPU instance. Here’s a simple SkyPilot task that processes the entire Book-Scan-OCR dataset (full example):

resources:
  accelerators: L40S:1

setup: |
  # Install DeepSeek OCR and dependencies
  # Download the Book-Scan-OCR dataset
  ...  

run: |
  source .venv/bin/activate
  # Process all images sequentially on one GPU
  python process_ocr.py --start-idx 0 --end-idx -1

Launch it with:

sky launch -c deepseek-ocr-single task.yaml

Unfortunately, for enterprise document archives with hundreds of thousands of scanned pages, this approach simply doesn’t scale. You’d be waiting days or weeks for results.

This is where you need parallel batch inference across multiple GPUs to make OCR practical for large document collections.

Scaling batch inference with SkyPilot pools

Here we’ll look into SkyPilot’s Pools feature and how it enables scalable batch inference for DeepSeek OCR. Pools let you spin up a fleet of GPU workers that stay warm and ready to process document batches in parallel.

What are pools and why use them for batch inference?

A pool is a collection of GPU instances that share the same setup - dependencies, models, datasets all installed once. They persist across jobs, so there are no cold starts or re-downloading gigabytes of model weights or datasets every time.

Key benefits for batch inference workloads:

Fully utilize GPU capacity: Pools allow you to utilize idle GPUs available across any of your clouds or Kubernetes clusters.
Unified queue: Submit any number of jobs - SkyPilot automatically distributes work across available workers.
Automatic recovery: Step aside for other higher priority jobs, and automatically reschedule when GPUs become available.
Dynamic submission: Add new jobs anytime without reconfiguring infrastructure.
Warm workers and elastic scaling: Models stay loaded and ready - no setup delays between jobs. Scale workers up or down with a single command.

It’s like having your own batch inference cluster that you control with a single YAML file, but with the flexibility to use GPUs from any provider.

Implementation: batch OCR pipeline

Let’s build a production-ready OCR pipeline for the Book-Scan-OCR dataset of scanned news and book pages.

Step 1: pool configuration

With the new pools feature, we separate the pool infrastructure definition from the job specification. The pool YAML defines the shared worker environment (view on GitHub):

Click to expand: pool.yaml

pool:
  workers: 3

resources:
  accelerators: L40S:1

file_mounts:
  ~/.kaggle/kaggle.json: ~/.kaggle/kaggle.json
  /outputs:
    source: s3://my-skypilot-bucket

workdir: .

setup: |
  # Setup runs once on all workers (must be non-blocking)
  sudo apt-get update && sudo apt-get install -y unzip
  uv venv .venv --python 3.12
  source .venv/bin/activate
  git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
  cd DeepSeek-OCR
  pip install kaggle
  uv pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  uv pip install vllm==0.8.5
  uv pip install flash-attn==2.7.3 --no-build-isolation
  uv pip install -r requirements.txt
  cd ..
  # Download dataset during setup (shared across all jobs)
  kaggle datasets download goapgo/book-scan-ocr-vlm-finetuning
  unzip -q book-scan-ocr-vlm-finetuning.zip -d book-scan-ocr

  echo "Setup complete!"

The job YAML defines the actual workload that runs on each worker (view on GitHub):

Click to expand: job.yaml

name: deepseek-ocr-job

resources:
  accelerators: L40S:1

run: |
  # Calculate job range using SKYPILOT_JOB_RANK and SKYPILOT_NUM_JOBS
  source .venv/bin/activate
  echo "Job rank: ${SKYPILOT_JOB_RANK}/${SKYPILOT_NUM_JOBS}"

  # Count total images in the dataset
  IMAGE_DIR=./book-scan-ocr/Book-Scan-OCR/images
  TOTAL_IMAGES=$(find ${IMAGE_DIR} -name "*.jpg" -o -name "*.png" | wc -l)
  echo "Total images: ${TOTAL_IMAGES}"

  # Calculate start and end indices for this job
  CHUNK_SIZE=$((TOTAL_IMAGES / SKYPILOT_NUM_JOBS))
  REMAINDER=$((TOTAL_IMAGES % SKYPILOT_NUM_JOBS))

  # Calculate start index
  START_IDX=$((SKYPILOT_JOB_RANK * CHUNK_SIZE))
  if [ ${SKYPILOT_JOB_RANK} -lt ${REMAINDER} ]; then
    START_IDX=$((START_IDX + SKYPILOT_JOB_RANK))
    CHUNK_SIZE=$((CHUNK_SIZE + 1))
  else
    START_IDX=$((START_IDX + REMAINDER))
  fi

  END_IDX=$((START_IDX + CHUNK_SIZE))

  echo "Processing images ${START_IDX} to ${END_IDX}"

  # Pass indices to Python script via CLI arguments
  python process_ocr.py --start-idx ${START_IDX} --end-idx ${END_IDX}
  echo "Job complete! Results saved to S3 bucket."

Key components

Pool configuration (pool.yaml): Defines the worker infrastructure and shared setup

pool:
  workers: 3  # Number of parallel GPU instances

setup: |
  # Runs once when each worker starts
  git clone https://github.com/deepseek-ai/DeepSeek-OCR.git
  kaggle datasets download goapgo/book-scan-ocr-vlm-finetuning
  # Install dependencies, download models, etc.

Job configuration (job.yaml): Defines the workload that executes on each worker

run: |
  # Runs for each job submitted to the pool
  python process_ocr.py --start-idx ${START_IDX} --end-idx ${END_IDX}

Separation of concerns: The pool YAML contains setup, file mounts, and infrastructure configuration. The job YAML contains only the run command and must match the pool’s resource requirements.

Automatic work distribution: SkyPilot provides environment variables to split work across jobs

run: |
  # Each job gets its own rank: 0, 1, 2, ...
  echo "Job rank: ${SKYPILOT_JOB_RANK}/${SKYPILOT_NUM_JOBS}"

  # Calculate which slice of images this job processes
  START_IDX=$((SKYPILOT_JOB_RANK * CHUNK_SIZE))
  END_IDX=$((START_IDX + CHUNK_SIZE))

Cloud storage integration: Results sync to S3 automatically (e.g. for use in downstream RAG systems)
```
file_mounts:
  /outputs:
    source: s3://my-skypilot-bucket  # Auto-synced
```

Step 2: processing script

The Python script takes start/end indices and processes its chunk (view on GitHub):

Click to expand: process_ocr.py

"""
DeepSeek OCR Image Processing Script
Processes images from the Book-Scan-OCR dataset.
"""

import argparse
import json
from pathlib import Path
from transformers import AutoModel, AutoTokenizer
import torch

def main():
    parser = argparse.ArgumentParser(description='Process OCR on image dataset')
    parser.add_argument('--start-idx', type=int, required=True)
    parser.add_argument('--end-idx', type=int, required=True)
    args = parser.parse_args()

    print(f"Processing range: {args.start_idx} to {args.end_idx}")

    # Load DeepSeek OCR model
    model_name = "deepseek-ai/deepseek-ocr"
    tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

    if tokenizer.pad_token is None:
        tokenizer.pad_token = tokenizer.eos_token

    model = AutoModel.from_pretrained(
        model_name,
        _attn_implementation='flash_attention_2',
        trust_remote_code=True,
        use_safetensors=True
    )
    model = model.eval().cuda().to(torch.bfloat16)

    # Find and slice images
    image_dir = Path.cwd() / "book-scan-ocr" / "Book-Scan-OCR" / "images"
    output_dir = Path("/outputs/ocr_results")
    output_dir.mkdir(parents=True, exist_ok=True)

    all_image_files = sorted(image_dir.glob("*.jpg")) + sorted(image_dir.glob("*.png"))
    image_files = all_image_files[args.start_idx:args.end_idx]

    print(f"Processing {len(image_files)} images...")

    results = []
    for idx, img_path in enumerate(image_files, 1):
        print(f"Processing {idx}/{len(image_files)}: {img_path.name}...")

        try:
            # Run OCR with grounding tag for structure awareness
            prompt = "<image>\n<|grounding|>Convert the document to markdown. "
            image_output_dir = output_dir / img_path.stem
            image_output_dir.mkdir(exist_ok=True)

            ocr_result = model.infer(
                tokenizer,
                prompt=prompt,
                image_file=str(img_path),
                output_path=str(image_output_dir),
                base_size=1024,
                image_size=640,
                crop_mode=True,
                save_results=True,
                test_compress=True
            )

            # Read the markdown result
            mmd_file = image_output_dir / "result.mmd"
            if mmd_file.exists():
                with open(mmd_file, 'r', encoding='utf-8') as f:
                    ocr_text = f.read()
            else:
                ocr_text = "[OCR completed but result not found]"

            # Save consolidated markdown at top level
            md_file = output_dir / f"{img_path.stem}.md"
            with open(md_file, 'w', encoding='utf-8') as f:
                f.write(f"# {img_path.name}\n\n{ocr_text}\n")

            # Save JSON metadata
            result = {"image_name": img_path.name, "ocr_text": ocr_text}
            results.append(result)

            json_file = output_dir / f"{img_path.stem}_ocr.json"
            with open(json_file, 'w', encoding='utf-8') as f:
                json.dump(result, f, indent=2, ensure_ascii=False)

            print(f"Saved markdown to {md_file}")

        except Exception as e:
            print(f"Error processing {img_path.name}: {e}")
            results.append({"image_name": img_path.name, "error": str(e)})

    # Save batch summary
    summary_file = output_dir / f"results_{args.start_idx}_{args.end_idx}.json"
    with open(summary_file, 'w', encoding='utf-8') as f:
        json.dump(results, f, indent=2, ensure_ascii=False)

    # Print summary
    successful = sum(1 for r in results if "error" not in r)
    print(f"\n{'='*60}")
    print(f"Processing complete!")
    print(f"Total: {len(results)} | Successful: {successful} | Failed: {len(results) - successful}")
    print(f"Results saved to {output_dir}")
    print('='*60)

if __name__ == "__main__":
    main()

Running the pipeline

Create the pool

Spin up 3 workers with the shared environment:

sky jobs pool apply -p deepseek-ocr-pool pool.yaml

SkyPilot will automatically select workers (based on availability and cost) and they can be spread across your K8s and 20+ other cloud providers for maximum flexibility. If you want to restrict to a specific cloud, you can add --infra k8s or --infra aws flags (replace k8s/aws with your desired provider).

Output:

YAML to run: pool.yaml
Pool spec:
Worker policy:  Fixed-size (3 workers)

Each pool worker will use the following resources (estimated):
Considered resources (1 node):
-------------------------------------------------------------------------------------------------------
 INFRA                     INSTANCE                     vCPUs   Mem(GB)   GPUS     COST ($)   CHOSEN   
-------------------------------------------------------------------------------------------------------
 Kubernetes (my-cluster)   -                            4       16        L40S:1   0.00          ✔     
 Nebius (eu-north1)        gpu-l40s-a_1gpu-8vcpu-32gb   8       32        L40S:1   1.55                
 AWS (us-east-1)           g6e.xlarge                   4       32        L40S:1   1.86                
-------------------------------------------------------------------------------------------------------
🔍 Multiple Nebius instances satisfy L40S:1. The cheapest (gpus=L40S:1, cpus=4, mem=16, ...) is considered among: gpu-l40s-d_1gpu-16vcpu-96gb, gpu-l40s-d_1gpu-48vcpu-288gb, gpu-l40s-d_1gpu-32vcpu-192gb, gpu-l40s-a_1gpu-24vcpu-96gb, gpu-l40s-a_1gpu-32vcpu-128gb, gpu-l40s-a_1gpu-16vcpu-64gb, gpu-l40s-a_1gpu-8vcpu-32gb, gpu-l40s-a_1gpu-40vcpu-160gb.
🔍 Multiple AWS instances satisfy L40S:1. The cheapest (gpus=L40S:1, cpus=4, mem=16, ...) is considered among: g6e.16xlarge, g6e.xlarge, g6e.4xlarge, g6e.8xlarge, g6e.2xlarge.
...               
  Check & install cloud dependencies on controller: done.                    
✓ Setup completed.  View logs: sky api logs -l sky-2025-11-10-17-37-53-737160/setup-*.log
⚙︎ Job submitted, ID: 1

Pool name: deepseek-ocr-pool
📋 Useful Commands
├── To submit jobs to the pool: sky jobs launch --pool deepseek-ocr-pool job.yaml
├── To submit multiple jobs:    sky jobs launch --pool deepseek-ocr-pool --num-jobs 10 job.yaml
├── To check the pool status:   sky jobs pool status deepseek-ocr-pool
├── To terminate the pool:      sky jobs pool down deepseek-ocr-pool
└── To update the number of workers:    sky jobs pool apply -p deepseek-ocr-pool --workers 5

✓ Successfully created pool 'deepseek-ocr-pool'.

Check pool status

See what your workers are doing:

sky jobs pool status deepseek-ocr-pool

Output:

Pools
NAME               VERSION  UPTIME  STATUS  WORKERS  
deepseek-ocr-pool  1        10m 8s   READY   3/3     

Pool Workers
POOL_NAME          ID  VERSION  LAUNCHED     INFRA                    RESOURCES                                 STATUS  USED_BY
deepseek-ocr-pool  1   1        4 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY   -
deepseek-ocr-pool  2   1        6 mins ago   Kubernetes (my-cluster)  1x(gpus=L40S:1, cpus=4, mem=16, ...)      READY   -
deepseek-ocr-pool  3   1        6 mins ago   Kubernetes (my-cluster)  1x(gpus=L40S:1, cpus=4, mem=16, ...)      READY   -

Pool Workers Status

Note that I have set up access to my own K8s cluster with two L40S GPUs, Nebius cloud in Europe and AWS in US East. So SkyPilot prioritized using my K8s cluster first, then Nebius for the remaining worker because it’s cheaper than AWS (by ~$0.32/hour). As we’ll see below, if I scale up later, SkyPilot will automatically provision additional workers across different clouds to meet the desired count.

Once all workers show READY status, they’ve completed setup with models and dataset loaded.

Submit batch jobs

Submit 10 parallel jobs to process all images:

sky jobs launch --pool deepseek-ocr-pool --num-jobs 10 job.yaml

This submits 10 jobs to the pool. Since we have 3 workers, the first 3 jobs start immediately, each assigned to a worker. The remaining 7 jobs are queued and will automatically start as workers become available. Each job calculates its slice of images via $SKYPILOT_JOB_RANK, so the work gets evenly distributed across all 10 jobs.

sky dashboard

SkyPilot Dashboard with Running Jobs

Watch progress

Check on your jobs:

sky jobs queue # or see the dashboard by running `sky dashboard`

Look at the logs:

$ sky jobs logs 2
├── Waiting for task resources on 1 node.
└── Job started. Streaming logs... (Ctrl-C to exit log streaming; job will not be killed)
(deepseek-ocr-job, pid=2433) Job rank: 2/3
(deepseek-ocr-job, pid=2433) Total images: 156
(deepseek-ocr-job, pid=2433) Processing images 104 to 156
(deepseek-ocr-job, pid=2433) Processing range: 104 to 156
...
(deepseek-ocr-job, pid=2433) Processing 52 images...
(deepseek-ocr-job, pid=2433) Processing 1/52: india_news_p000104.jpg...
...

Scale the pool

Want to go faster? Scale up:

sky jobs pool apply --pool deepseek-ocr-pool --workers 10

If you have setup access to multiple clouds and K8s clusters, SkyPilot will automatically provision additional workers across different providers to meet your desired count. So your pool might look like this:

$ sky jobs pool status deepseek-ocr-pool                           
Pools
NAME               VERSION  UPTIME  STATUS  WORKERS  
deepseek-ocr-pool  1        5m      READY   10/10     

Pool Workers
POOL_NAME          ID  VERSION  LAUNCHED     INFRA                    RESOURCES                                 STATUS        USED_BY  
deepseek-ocr-pool  1   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -          
deepseek-ocr-pool  2   1        3 mins ago   Kubernetes (my-cluster)  1x(gpus=L40S:1, cpus=4, mem=16, ...)      READY         -        
deepseek-ocr-pool  3   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -        
deepseek-ocr-pool  4   1        3 mins ago   Kubernetes (my-cluster)  1x(gpus=L40S:1, cpus=4, mem=16, ...)      READY         -         
deepseek-ocr-pool  5   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -        
deepseek-ocr-pool  6   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -        
deepseek-ocr-pool  7   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -        
deepseek-ocr-pool  8   1        3 mins ago   Nebius (eu-north1)       1x(gpus=L40S:1, gpu-l40s-a_1gpu..., ...)  READY         -        
deepseek-ocr-pool  9   1        3 mins ago   AWS (us-east-1a)         1x(gpus=L40S:1, g6e.xlarge, ...)          READY         -
deepseek-ocr-pool  10  1        3 mins ago   AWS (us-east-1a)         1x(gpus=L40S:1, g6e.xlarge, ...)          READY         -

Scale pool workers

Then launch more jobs:

sky jobs launch --pool deepseek-ocr-pool --num-jobs 20 job.yaml

Results and integration

Once processing finishes, your S3 bucket has all the converted documents:

$ aws s3 ls s3://my-skypilot-bucket/ocr_results/
                           PRE india_news_p000000/
                           PRE india_news_p000001/
                           PRE india_news_p000002/
                           PRE india_news_p000003/
                           ...

Inside each directory there’s an .md file with clean markdown text ready for RAG systems. Point Glean, Onyx, or whatever pipeline you’re using at the bucket and you’re done.

Here’s an example of what the processing looks like. This scanned two-column news article:

Sample scanned document

gets converted into clean and structured markdown: Sample OCR output

Once processed, you can point your RAG system (Onyx, Glean, etc.) to the S3 bucket and start asking questions about the documents. The previously inaccessible scanned content becomes part of your searchable knowledge base. For example, asking about the sample document above:

Onyx Query Onyx retrieves relevant context from the resulting markdown documents to answer questions about the scanned content.

Enterprise RAG with OCR Enterprise RAG systems can access digital documents directly, but scanned documents and legacy PDFs require OCR processing first. DeepSeek OCR combined with SkyPilot’s parallel GPU workers converts these unreadable images into clean markdown format. Once processed, all enterprise knowledge becomes searchable and accessible through the RAG system for Q&A, summarization, and analysis.

Why pools work well for batch inference

How does SkyPilot Pools compare to other batch inference approaches?

DIY scripts and SSH: You could manually partition data, SSH into each GPU node, and run jobs. This works for small runs but becomes a coordination nightmare at scale: no automatic job distribution, no failure recovery, and no visibility into what’s running where.

Kubernetes Jobs / Argo Workflows: These work well if all your GPUs are in one Kubernetes cluster. But if you have capacity spread across Hyperscalers, Neoclouds, or on-prem clusters, you’d need to manage workflows separately for each. SkyPilot unifies them into a single pool.

Cloud batch services (AWS Batch, GCP Batch): These vendor-specific services lock you into one cloud and one region. When GPU capacity runs out, you’re stuck manually reconfiguring for another region. They also require significant setup - IAM roles, compute environments, job queues, container images - before you can run anything. SkyPilot replaces this with a single YAML file that works across 17+ clouds and all regions.

Ray / Dask: Great for distributed computing within a cluster, but require you to provision and manage the underlying infrastructure yourself. SkyPilot handles both the infrastructure provisioning and the job orchestration.

The key difference: SkyPilot Pools give you a single control plane that spans multiple clouds and Kubernetes clusters, with warm workers that skip repeated setup costs. For batch inference workloads processing thousands of documents, this means higher GPU utilization and faster end-to-end throughput.

Beyond OCR: other batch inference use cases

The same batch inference pattern with SkyPilot pools works for other embarrassingly parallel workloads:

Large-scale model inference: Process millions of samples for classification, embedding generation, or LLM inference
Video processing: Batch transcription, scene detection, and content analysis across video archives
Model training: Train multiple models with different hyperparameters simultaneously
Scientific computing: Parameter sweeps, Monte Carlo simulations, and computational experiments
ETL pipelines: Transform massive datasets in parallel across distributed workers

Any workload that can be split into independent batches benefits from this architecture.

Wrapping up

Modern OCR models like DeepSeek solve the document understanding problem and SkyPilot’s pools solve the batch inference scaling problem. When you combine them together, you can improve usefulness of the AI systems in your organization by providing them with additional knowledge trapped in scanned documents.

The implementation is pretty straightforward - define your batch inference environment once, submit jobs with one command, and let SkyPilot handle the orchestration across multiple clouds. If you’ve got archives of scanned documents collecting dust, this gives you a practical way to make them searchable and useful again.

Resources

SkyPilot Pools Documentation
DeepSeek OCR GitHub
Complete example code - includes pool.yaml, job.yaml, process_ocr.py, and sample output

Enter DeepSeek-OCR#

Batch inference with SkyPilot Pools#

The naive approach: single GPU processing#

Scaling batch inference with SkyPilot pools#

What are pools and why use them for batch inference?#

Implementation: batch OCR pipeline#

Step 1: pool configuration#

Key components#

Step 2: processing script#

Running the pipeline#

Create the pool#

Check pool status#

Submit batch jobs#

Watch progress#

Scale the pool#

Results and integration#

Why pools work well for batch inference#

Beyond OCR: other batch inference use cases#

Wrapping up#

Resources#