openbmb/MiniCPM5-1B
MiniCPM5-1B — dense 1B on-device LLM with hybrid Think/No-Think reasoning, native 128K context, and strong agentic tool use, built on the standard Llama architecture
1B-class open-source SOTA on tool use, code, and reasoning
Guide
Overview
MiniCPM5-1B is the first checkpoint in OpenBMB's MiniCPM5 series — a dense
1B model built for on-device and resource-constrained deployment, reaching
1B-class open-source SOTA on agentic tool use, code generation, and difficult
reasoning. It uses the standard LlamaForCausalLM architecture, so vLLM
loads it natively with no custom kernels or model-code fork.
Its headline feature is hybrid reasoning: a single checkpoint serves as
both a fast assistant (No-Think) and a deliberate reasoner (Think), toggled by
the chat template's enable_thinking flag.
Prerequisites
- vLLM ≥ 0.21.0 — MiniCPM5-1B is supported natively as of the v0.21.0
release. (For CUDA 12.x driver hosts, the cookbook suggests
vllm==0.10.1.1as a fallback.)
Launch command
Use the command builder above. The baseline is simply:
vllm serve openbmb/MiniCPM5-1B --port 8000
At ~1.1B params the model fits on a single GPU (TP=1). It supports the full
native 128K context; drop --max-model-len to 8192 / 32768 to free KV
cache on small GPUs, and set --enforce-eager if CUDA graphs OOM on a tiny
VRAM budget.
Reasoning modes
Toggle the Reasoning feature to serve with deep-thinking on by default
(--default-chat-template-kwargs '{"enable_thinking": true}'). You can also
flip it per request via chat_template_kwargs:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openbmb/MiniCPM5-1B",
"messages": [{"role": "user", "content": "Explain GQA in one sentence."}],
"temperature": 0.9, "top_p": 0.95, "max_tokens": 1024,
"chat_template_kwargs": {"enable_thinking": true}
}'
| Mode | enable_thinking | temperature | top_p |
|---|---|---|---|
| Think | true | 0.9 | 0.95 |
| No-Think | false | 0.7 | 0.95 |
Tool calling
MiniCPM5-1B emits XML-style tool calls. The vLLM-side minicpm5 parser
(PR #43175) merged to main
on 2026-05-27 but is not in v0.21.0 or v0.22.0 — those releases were cut
before the merge. Until a release bakes it in (v0.23+), load it as a plugin
from the MiniCPM repo:
vllm serve openbmb/MiniCPM5-1B --port 8000 \
--enable-auto-tool-choice \
--tool-parser-plugin /path/to/MiniCPM/tool_parsers/minicpm5xml_tool_parser.py \
--tool-call-parser minicpm5
SGLang ships the minicpm5 parser built-in and is the author-recommended
backend for tool calling.