vLLM/Recipes
Tencent Hunyuan

tencent/Hunyuan-A13B-Instruct

Tencent Hunyuan A13B instruct-tuned MoE language model with AITER-accelerated AMD ROCm deployment

View on HuggingFace
moe80B / 13B32,768 ctxvLLM 0.11.0+text
Guide

Overview

Hunyuan-A13B-Instruct is Tencent's instruct-tuned Hunyuan MoE model. This recipe covers deployment on AMD ROCm GPUs (MI300X / MI325X / MI355X) with AITER acceleration enabled via VLLM_ROCM_USE_AITER=1.

Prerequisites

  • vLLM version: ROCm build
  • Python: 3.12
  • Hardware: AMD MI300X / MI325X / MI355X
  • ROCm: 7.0+, glibc >= 2.35 (or use Docker)

Install vLLM (ROCm)

uv venv
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/

If the environment does not meet the Python/ROCm/glibc requirements, use the Docker-based setup from the vLLM install docs.

Launching the Server

export VLLM_ROCM_USE_AITER=1
vllm serve tencent/Hunyuan-A13B-Instruct \
    --tensor-parallel-size 2 \
    --trust-remote-code

Benchmarking

vllm bench serve \
  --model "tencent/Hunyuan-A13B-Instruct" \
  --dataset-name random \
  --random-input-len 8000 \
  --random-output-len 1000 \
  --request-rate 10000 \
  --num-prompts 16 \
  --ignore-eos

Troubleshooting

  • First launch delay: AITER JIT-compiles optimized kernels on first launch, which can take several minutes. Subsequent runs use cached kernels.
  • Environment mismatch: If wheel install fails, fall back to the vLLM ROCm Docker image.

References