vLLM/Recipes
Preferred Networks

pfnet/plamo-3-nict-31b-base

Largest PLaMo 3 NICT Japanese/English base model with interleaved sliding-window and full attention.

31B-class PLaMo 3 base checkpoint for high-quality bilingual generation

dense32B4,096 ctxvLLM 0.12.0+text
Guide

Overview

PLaMo 3 NICT 31B Base is the largest PLaMo 3 NICT base model from Preferred Networks and NICT. It is pretrained on Japanese and English data with a hybrid attention stack combining sliding-window and full-attention layers.

This is a base checkpoint. Use completion-style prompts as-is, or fine-tune it before deploying instruction-following or chat-style workflows.

Prerequisites

  • Hardware: H100/H200 80 GB for 4K context; use TP=2 for extra KV-cache headroom
  • vLLM: >= 0.12.0
  • License: gated Hugging Face access under the PLaMo community license

Install vLLM

uv venv
source .venv/bin/activate
uv pip install -U vllm --torch-backend=auto
huggingface-cli login

Launching the Server

vllm serve pfnet/plamo-3-nict-31b-base \
  --trust-remote-code \
  --max-model-len 4096

With tensor parallelism for more KV-cache headroom:

vllm serve pfnet/plamo-3-nict-31b-base \
  --trust-remote-code \
  --max-model-len 4096 \
  --tensor-parallel-size 2

Client Usage

from openai import OpenAI

client = OpenAI(api_key="EMPTY", base_url="http://localhost:8000/v1")
resp = client.completions.create(
    model="pfnet/plamo-3-nict-31b-base",
    prompt="The future of artificial intelligence technology is ",
    max_tokens=128,
    temperature=0.7,
)
print(resp.choices[0].text)

Troubleshooting

  • Out of memory on 80 GB GPUs: add --tensor-parallel-size 2 or lower --max-model-len.
  • Gated repo errors: accept the PLaMo community license on Hugging Face and run huggingface-cli login.
  • Repetitive or low-quality completions: adjust sampling for free-form generation, or fine-tune the model for chat or assistant workloads.

References