PaddlePaddle/PaddleOCR-VL-1.5

PaddleOCR-VL-1.5 (0.9B) — next-gen compact VLM for document parsing; adds text spotting, seal recognition, and Tibetan/Bengali

View on HuggingFace

dense0.9B131,072 ctxvLLM 0.11.1+multimodal

Guide

Overview

PaddleOCR-VL-1.5 is the next generation of PaddleOCR-VL — same 0.9B architecture (NaViT-style dynamic-resolution vision encoder + ERNIE-4.5-0.3B LM), with substantial accuracy gains and new tasks:

94.5% on OmniDocBench v1.5 (SOTA)
Text spotting — line-level localization + recognition in one pass
Seal recognition — new task with SOTA results
Irregular-shape localization — polygonal detection under skew/warping
Multilingual — adds Tibetan and Bengali
Cross-page table merging and paragraph-heading recognition

Architecture is identical to PaddleOCR-VL, so the vLLM launch command and feature flags are the same.

Prerequisites

Hardware: 1x GPU (small VRAM footprint)
vLLM >= 0.11.1 (nightly if not released yet)

Install vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --pre \
  --extra-index-url https://wheels.vllm.ai/nightly \
  --extra-index-url https://download.pytorch.org/whl/cu129 \
  --index-strategy unsafe-best-match

Launch command

vllm serve PaddlePaddle/PaddleOCR-VL-1.5 \
  --trust-remote-code \
  --max-num-batched-tokens 16384 \
  --no-enable-prefix-caching \
  --mm-processor-cache-gb 0

Tip: OCR workloads don't benefit much from prefix caching or image reuse, so disable those to avoid hashing/caching overhead.

Client Usage

Task-specific prompts (note the two new tasks spotting and seal):

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)

TASKS = {
    "ocr": "OCR:",
    "table": "Table Recognition:",
    "formula": "Formula Recognition:",
    "chart": "Chart Recognition:",
    "spotting": "Spotting:",
    "seal": "Seal Recognition:",
}

response = client.chat.completions.create(
    model="PaddlePaddle/PaddleOCR-VL-1.5",
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "https://.../receipt.png"}},
            {"type": "text", "text": TASKS["spotting"]},
        ],
    }],
    temperature=0.0,
)
print(response.choices[0].message.content)

Offline Inference with PP-DocLayoutV2

Use separate venvs for vllm and paddlepaddle to avoid conflicts. If you see "The model PaddleOCR-VL-1.5-0.9B does not exist.", add --served-model-name PaddleOCR-VL-1.5-0.9B.

uv pip install paddlepaddle-gpu==3.2.1 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
uv pip install -U "paddleocr[doc-parser]"
uv pip install safetensors

from paddleocr import PaddleOCRVL

pipeline = PaddleOCRVL(
    vl_rec_model_name="PaddleOCR-VL-1.5-0.9B",
    vl_rec_backend="vllm-server",
    vl_rec_server_url="http://localhost:8000/v1",
    layout_detection_model_name="PP-DocLayoutV2",
    layout_detection_model_dir="/path/to/your/PP-DocLayoutV2/",
)

output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
for i, res in enumerate(output):
    res.save_to_json(save_path=f"output_{i}.json")
    res.save_to_markdown(save_path=f"output_{i}.md")