vLLM/Recipes
GLM (Z-AI)

zai-org/GLM-ASR-Nano-2512

Open-source speech recognition model (~2B) with strong dialect support (Cantonese and others) and robust low-volume speech transcription

View on HuggingFace
dense2.3B / 1.5B8,192 ctxvLLM 0.14.1+multimodal
Guide

Overview

GLM-ASR-Nano-2512 is an open-source automatic speech recognition model with 1.5B active parameters (2B total). It outperforms OpenAI Whisper V3 on multiple benchmarks while remaining compact enough for single-GPU deployment.

Key Capabilities

  • Dialect support: Beyond standard Mandarin and English, strong on Cantonese (粤语) and other Chinese dialects.
  • Low-volume speech: Specifically trained for "whisper/quiet speech" scenarios.
  • SOTA accuracy: Lowest average error rate (4.10) among comparable open-source models, strong on Wenet Meeting, Aishell-1, and similar Chinese benchmarks.

Prerequisites

  • vLLM version: >= 0.14.1 (with [audio] extras)
  • Transformers: install from source for latest

Install Dependencies

uv venv
source .venv/bin/activate
uv pip install git+https://github.com/huggingface/transformers.git
uv pip install -U "vllm[audio]" --torch-backend auto

Launching the Server

vllm serve zai-org/GLM-ASR-Nano-2512

Client Usage

OpenAI SDK (Audio URL)

import base64
import httpx
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

audio_url = "https://github.com/zai-org/GLM-ASR/raw/main/examples/example_en.wav"
audio_data = base64.b64encode(httpx.get(audio_url).content).decode("utf-8")

response = client.chat.completions.create(
    model="zai-org/GLM-ASR-Nano-2512",
    messages=[{
        "role": "user",
        "content": [{
            "type": "input_audio",
            "input_audio": {"data": audio_data, "format": "wav"}
        }]
    }],
    max_tokens=500,
)
print(response.choices[0].message.content)

Transcribe Endpoint

import httpx
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

audio_file = httpx.get("https://github.com/zai-org/GLM-ASR/raw/main/examples/example_en.wav").content

response = client.audio.transcriptions.create(
    model="zai-org/GLM-ASR-Nano-2512",
    file=("audio.wav", audio_file),
)
print(response.text)

cURL (Transcribe)

curl http://localhost:8000/v1/audio/transcriptions \
  -H "Authorization: Bearer EMPTY" \
  -F "model=zai-org/GLM-ASR-Nano-2512" \
  -F "file=@your_audio.wav"

Local Audio File (chat API)

import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")

with open("your_audio.mp3", "rb") as f:
    audio_data = base64.b64encode(f.read()).decode("utf-8")

response = client.chat.completions.create(
    model="zai-org/GLM-ASR-Nano-2512",
    messages=[{
        "role": "user",
        "content": [{"type": "input_audio", "input_audio": {"data": audio_data, "format": "mp3"}}]
    }],
    max_tokens=500,
)
print(response.choices[0].message.content)

Troubleshooting

  • Transformers version: Requires transformers >= 5.0.0 for best compatibility.
  • Audio formats: Supports wav, mp3, flac, and other common formats.

References