Overview
MoonshineSTTService provides offline speech recognition using Moonshine’s small, fast ASR models running locally on the CPU via ONNX Runtime. No GPU required, no API key needed - models download once on first use and are cached locally for privacy-focused transcription.
Moonshine STT API Reference
Pipecat’s API methods for Moonshine STT integration
Moonshine Example
Complete example with Moonshine STT
Moonshine Documentation
Moonshine ASR model details and research
Moonshine Voice Package
Python package for Moonshine models
Installation
Prerequisites
Local Model Setup
Before using Moonshine STT service, you need:- Model Selection: Choose appropriate Moonshine model size (tiny, base, small-streaming, medium-streaming)
- Storage Space: Ensure sufficient disk space for model downloads (models are cached after first use)
- CPU Resources: Moonshine runs efficiently on CPU via ONNX Runtime
Configuration Options
- Model Size: Balance between accuracy and performance based on your needs
- Language Support: Moonshine supports English, Spanish, and other languages
- No API Key: Runs entirely locally for complete privacy
Configuration
Runtime-configurable settings for the STT service. See MoonshineSTTService Settings below.
MoonshineSTTSettings
Runtime-configurable settings passed via thesettings constructor argument using MoonshineSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
| Parameter | Type | Default | Description |
|---|---|---|---|
model | str | Model | Model.SMALL_STREAMING | Moonshine model architecture. Available models: TINY, BASE, TINY_STREAMING, BASE_STREAMING, SMALL_STREAMING (default), MEDIUM_STREAMING. |
language | Language | str | Language.EN | Language for transcription. Moonshine supports English, Spanish, and other languages. The base language code is used (e.g., “en” from “en-US”). (Inherited from base STT settings.) |
Usage
Basic Setup
With Custom Model
With Custom Language
With Model as String
Notes
- First run downloads: The selected model downloads from the Moonshine model hub on first use and is cached locally. Later runs load it from the cache.
- Segmented transcription:
MoonshineSTTServiceextendsSegmentedSTTService, meaning it processes complete audio segments after VAD detects the user has stopped speaking. - CPU-only: Moonshine runs efficiently on CPU via ONNX Runtime, so no GPU is required. This makes it ideal for resource-constrained environments.
- Audio format: Expects 16-bit mono PCM audio at 16 kHz sample rate.
- Model variants: The streaming-capable models (
TINY_STREAMING,SMALL_STREAMING,MEDIUM_STREAMING) can be run in batch mode just like the non-streaming variants. The larger streaming models (SMALL_STREAMING,MEDIUM_STREAMING) are only available in streaming form. - Language support: Moonshine supports multiple languages (English, Spanish, and others). The service uses the base language code (e.g., “en” from “en-US”).
- No external dependencies: Unlike API-based STT services, Moonshine requires no API keys or network connectivity after the initial model download.