JetBrains/Mellum2-12B-A2.5B-Instruct
JetBrains' instruction-tuned code MoE (12B total / 2.5B active) that answers directly without an externalized chain of thought — low-latency coding and tool use
78.4 EvalPlus, 67.1 MultiPL-E — direct answers, fits on a single GPU
Guide
Overview
Mellum2-12B-A2.5B-Instruct is JetBrains' instruction-tuned code assistant. It shares the same Mixture-of-Experts backbone as the rest of the Mellum2 family — 64 experts (8 activated per token), 12B total / 2.5B active parameters, sliding-window + full-attention layers, 131,072-token context — but is post-trained (SFT + RLVR on math, coding, tool use, instruction following, reasoning, and knowledge) to answer directly, without an externalized chain of thought. For complex debugging, multi-step planning, or math/reasoning-heavy tasks where you want explicit reasoning traces, use the Thinking variant instead.
Prerequisites
- Hardware: a single H200, H100, or A100 (~29 GB at bf16) is plenty
- vLLM nightly —
MellumForCausalLMsupport landed after v0.22.0 and is not yet in a stable release. Install the nightly wheels until the next tagged release ships.
Install vLLM (nightly)
uv venv
source .venv/bin/activate
uv pip install -U vllm --extra-index-url https://wheels.vllm.ai/nightly
Launch command
Unlike the Thinking checkpoint, Instruct does not emit <think> blocks, so no
--reasoning-parser is needed.
# Plain serving
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \
--max-model-len 131072
# Add tool calling
vllm serve JetBrains/Mellum2-12B-A2.5B-Instruct \
--max-model-len 131072 \
--enable-auto-tool-choice \
--tool-call-parser hermes
Client usage
JetBrains recommends sampling at temperature=0.6, top_p=0.95, top_k=20.
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
resp = client.chat.completions.create(
model="JetBrains/Mellum2-12B-A2.5B-Instruct",
messages=[{"role": "user", "content": "Write a Python function to reverse a string."}],
max_tokens=81920,
temperature=0.6,
top_p=0.95,
extra_body={"top_k": 20},
)
print(resp.choices[0].message.content)