zai-org/Glyph
Visual-text compression framework that renders long text into images and processes them with a reasoning VLM, scaling effective context length
View on HuggingFaceGuide
Overview
Glyph is a framework from Zhipu AI for scaling
context length via visual-text compression. It renders long textual sequences into
images and processes them with a vision-language model. This recipe covers the vLLM
deployment of the zai-org/Glyph VLM as a component in that framework.
Glyph is a reasoning multimodal model, so --reasoning-parser glm45 is recommended
to parse reasoning traces from outputs.
Prerequisites
- vLLM version: latest stable
- Hardware: 1x H100 or 1x MI300X/MI325X
Install vLLM (NVIDIA)
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend auto
Install vLLM (AMD ROCm)
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm
ROCm wheel requires Python 3.12, ROCm 7.0, glibc >= 2.35.
Launching the Server
Single H100 GPU
vllm serve zai-org/Glyph \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--reasoning-parser glm45 \
--limit-mm-per-prompt.video 0
Single MI300X / MI325X
VLLM_ROCM_USE_AITER=1 \
SAFETENSORS_FAST_GPU=1 \
vllm serve zai-org/Glyph \
--no-enable-prefix-caching \
--mm-processor-cache-gb 0 \
--reasoning-parser glm45 \
--limit-mm-per-prompt.video 0
Configuration Tips
--no-enable-prefix-cachingand--mm-processor-cache-gb 0are recommended for OCR-like workloads where image reuse is uncommon; they avoid unnecessary hashing and caching overhead.- Adjust
--max-num-batched-tokensfor throughput according to your hardware.
Benchmarking
vllm bench serve \
--model zai-org/Glyph \
--dataset-name random \
--random-input-len 8192 \
--random-output-len 512 \
--request-rate 10000 \
--num-prompts 16 \
--ignore-eos
Troubleshooting
- Reasoning traces: Use
--reasoning-parser glm45to extract reasoning content. - Slow first inference: Disabling prefix caching and multimodal processor caching is intentional for Glyph's use case and trades off first-request latency for predictable throughput.