stabilityai/stable-audio-open-1.0
Text-to-audio generation model (1.2B params) producing up to ~47 s stereo audio at 44.1 kHz, served via vLLM-Omni
View on HuggingFaceGuide
Overview
Stable Audio Open 1.0 is Stability AI's text-to-audio generation model (~1.2B parameters). It produces stereo audio at 44.1 kHz, up to ~47 seconds. Served via vLLM-Omni (not standard vLLM).
Limitations:
- No realistic vocals (no singing or speech).
- English-only training data.
- Better at sound effects than complex music.
Prerequisites
- vLLM-Omni on top of vLLM 0.14.1
soundfileorscipyfor saving audio
Installation
uv venv
source .venv/bin/activate
uv pip install vllm==0.14.1
uv pip install git+https://github.com/vllm-project/vllm-omni.git
# Audio saving
uv pip install soundfile
Python Usage
import torch
import soundfile as sf
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="stabilityai/stable-audio-open-1.0")
generator = torch.Generator(device="cuda").manual_seed(42)
audio = omni.generate(
"The sound of a dog barking",
negative_prompt="Low quality.",
generator=generator,
guidance_scale=7.0,
num_inference_steps=100,
extra={"audio_start_in_s": 0.0, "audio_end_in_s": 10.0},
)
audio_data = audio[0].cpu().float().numpy().T # [samples, channels]
sf.write("output.wav", audio_data, 44100)
CLI Usage (from vLLM-Omni repo)
python examples/offline_inference/text_to_audio/text_to_audio.py \
--model stabilityai/stable-audio-open-1.0 \
--prompt "The sound of a dog barking" \
--audio-length 10.0 \
--num-inference-steps 100 \
--guidance-scale 7.0 \
--output dog_barking.wav
Key Parameters
| Parameter | Default | Description |
|---|---|---|
audio_start_in_s | 0.0 | Start time in seconds |
audio_end_in_s | 10.0 | End time in seconds |
num_inference_steps | 100 | Denoising steps (higher = better quality, slower) |
guidance_scale | 7.0 | Classifier-free guidance scale |
negative_prompt | "Low quality." | Text to avoid |
num_waveforms | 1 | Samples per prompt |
sample_rate | 44100 | Output sample rate (Hz) |
License
Released under the Stability AI Community License. Commercial use requires a separate license.