PaddlePaddle/PaddleOCR-VL-1.5
PaddleOCR-VL-1.5 (0.9B) — next-gen compact VLM for document parsing; adds text spotting, seal recognition, and Tibetan/Bengali
View on HuggingFaceGuide
Overview
PaddleOCR-VL-1.5 is the next generation of PaddleOCR-VL — same 0.9B architecture (NaViT-style dynamic-resolution vision encoder + ERNIE-4.5-0.3B LM), with substantial accuracy gains and new tasks:
- 94.5% on OmniDocBench v1.5 (SOTA)
- Text spotting — line-level localization + recognition in one pass
- Seal recognition — new task with SOTA results
- Irregular-shape localization — polygonal detection under skew/warping
- Multilingual — adds Tibetan and Bengali
- Cross-page table merging and paragraph-heading recognition
Architecture is identical to PaddleOCR-VL, so the vLLM launch command and feature flags are the same.
Prerequisites
- Hardware: 1x GPU (small VRAM footprint)
- vLLM >= 0.11.1 (nightly if not released yet)
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --pre \
--extra-index-url https://wheels.vllm.ai/nightly \
--extra-index-url https://download.pytorch.org/whl/cu129 \
--index-strategy unsafe-best-match
Launch command
vllm serve PaddlePaddle/PaddleOCR-VL-1.5 \
--trust-remote-code \
--max-num-batched-tokens 16384 \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0
Tip: OCR workloads don't benefit much from prefix caching or image reuse, so disable those to avoid hashing/caching overhead.
Client Usage
Task-specific prompts (note the two new tasks spotting and seal):
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1", timeout=3600)
TASKS = {
"ocr": "OCR:",
"table": "Table Recognition:",
"formula": "Formula Recognition:",
"chart": "Chart Recognition:",
"spotting": "Spotting:",
"seal": "Seal Recognition:",
}
response = client.chat.completions.create(
model="PaddlePaddle/PaddleOCR-VL-1.5",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://.../receipt.png"}},
{"type": "text", "text": TASKS["spotting"]},
],
}],
temperature=0.0,
)
print(response.choices[0].message.content)
Offline Inference with PP-DocLayoutV2
Use separate venvs for vllm and paddlepaddle to avoid conflicts. If you see
"The model PaddleOCR-VL-1.5-0.9B does not exist.", add
--served-model-name PaddleOCR-VL-1.5-0.9B.
uv pip install paddlepaddle-gpu==3.2.1 --extra-index-url https://www.paddlepaddle.org.cn/packages/stable/cu126/
uv pip install -U "paddleocr[doc-parser]"
uv pip install safetensors
from paddleocr import PaddleOCRVL
pipeline = PaddleOCRVL(
vl_rec_model_name="PaddleOCR-VL-1.5-0.9B",
vl_rec_backend="vllm-server",
vl_rec_server_url="http://localhost:8000/v1",
layout_detection_model_name="PP-DocLayoutV2",
layout_detection_model_dir="/path/to/your/PP-DocLayoutV2/",
)
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png")
for i, res in enumerate(output):
res.save_to_json(save_path=f"output_{i}.json")
res.save_to_markdown(save_path=f"output_{i}.md")