pfnet/plamo-2-translate
Post-trained PLaMo 2 translation model specialized for English/Japanese translation tasks.
PLaMo 2 post-trained for English/Japanese translation
Guide
Overview
PLaMo 2 Translate is a specialized translation model developed by Preferred Networks for English/Japanese translation tasks. It is the post-trained main model in the PLaMo Translation Model family.
The family includes three checkpoints sharing the same architecture:
- pfnet/plamo-2-translate: post-trained model for production translation
- pfnet/plamo-2-translate-base: base checkpoint for fine-tuning experiments
- pfnet/plamo-2-translate-eval: pair-wise evaluator that picks the better of two candidate translations
All three checkpoints can be served with the same vLLM flags shown here; just swap the model id.
Prerequisites
- Hardware: 1x GPU with at least 24 GB VRAM, such as L40S, A30, or RTX 4090
- vLLM: >= 0.8.5
- License: PLaMo community license
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend=auto
Launching the Server
vllm serve pfnet/plamo-2-translate \
--trust-remote-code \
--max-model-len 8192
For tighter memory budgets, reduce --max-model-len (the model card
examples use 2000).
Prompt Format
Use the PLaMo operation-token structure. Place the source text after the
input line and let generation continue after the output line.
<|plamo:op|>dataset
translation
<|plamo:op|>input lang=English
The text to translate goes here.
<|plamo:op|>output lang=Japanese
Client Usage
The <|plamo:op|> stop sequence prevents the model from generating
beyond the translated text.
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
prompt = """<|plamo:op|>dataset
translation
<|plamo:op|>input lang=English
The text to translate goes here.
<|plamo:op|>output lang=Japanese
"""
resp = client.completions.create(
model="pfnet/plamo-2-translate",
prompt=prompt,
max_tokens=1024,
temperature=0.0,
stop=["<|plamo:op|>"],
)
print(resp.choices[0].text)
Troubleshooting
- Out of memory: lower
--max-model-lento around 2000, or add--gpu-memory-utilization 0.95. - Garbled or empty output: check that the prompt uses the exact
<|plamo:op|>operation-token format shown above. - Evaluator usage: serve
pfnet/plamo-2-translate-evaland follow the pair-wise evaluation prompt format from its model card.