pfnet/plamo-3-nict-31b-base
Largest PLaMo 3 NICT Japanese/English base model with interleaved sliding-window and full attention.
31B-class PLaMo 3 base checkpoint for high-quality bilingual generation
Guide
Overview
PLaMo 3 NICT 31B Base is the largest PLaMo 3 NICT base model from Preferred Networks and NICT. It is pretrained on Japanese and English data with a hybrid attention stack combining sliding-window and full-attention layers.
This is a base checkpoint. Use completion-style prompts as-is, or fine-tune it before deploying instruction-following or chat-style workflows.
Prerequisites
- Hardware: H100/H200 80 GB for 4K context; use TP=2 for extra KV-cache headroom
- vLLM: >= 0.12.0
- License: gated Hugging Face access under the PLaMo community license
Install vLLM
uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend=auto
huggingface-cli login
Launching the Server
vllm serve pfnet/plamo-3-nict-31b-base \
--trust-remote-code \
--max-model-len 4096
With tensor parallelism for more KV-cache headroom:
vllm serve pfnet/plamo-3-nict-31b-base \
--trust-remote-code \
--max-model-len 4096 \
--tensor-parallel-size 2
Client Usage
from openai import OpenAI
client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
resp = client.completions.create(
model="pfnet/plamo-3-nict-31b-base",
prompt="The future of artificial intelligence technology is ",
max_tokens=128,
temperature=0.7,
)
print(resp.choices[0].text)
Troubleshooting
- Out of memory on 80 GB GPUs: add
--tensor-parallel-size 2or lower--max-model-len. - Gated repo errors: accept the PLaMo community license on Hugging Face and
run
huggingface-cli login. - Repetitive or low-quality completions: adjust sampling for free-form generation, or fine-tune the model for chat or assistant workloads.