Skip to main content

Overview

MoonshineSTTService provides offline speech recognition using Moonshine’s small, fast ASR models running locally on the CPU via ONNX Runtime. No GPU required, no API key needed - models download once on first use and are cached locally for privacy-focused transcription.

Moonshine STT API Reference

Pipecat’s API methods for Moonshine STT integration

Moonshine Example

Complete example with Moonshine STT

Moonshine Documentation

Moonshine ASR model details and research

Moonshine Voice Package

Python package for Moonshine models

Installation

uv add "pipecat-ai[moonshine]"

Prerequisites

Local Model Setup

Before using Moonshine STT service, you need:
  1. Model Selection: Choose appropriate Moonshine model size (tiny, base, small-streaming, medium-streaming)
  2. Storage Space: Ensure sufficient disk space for model downloads (models are cached after first use)
  3. CPU Resources: Moonshine runs efficiently on CPU via ONNX Runtime

Configuration Options

  • Model Size: Balance between accuracy and performance based on your needs
  • Language Support: Moonshine supports English, Spanish, and other languages
  • No API Key: Runs entirely locally for complete privacy
No API keys or GPU required - Moonshine runs efficiently on CPU for complete privacy.

Configuration

settings
MoonshineSTTService.Settings
default:"None"
Runtime-configurable settings for the STT service. See MoonshineSTTService Settings below.

MoonshineSTTSettings

Runtime-configurable settings passed via the settings constructor argument using MoonshineSTTService.Settings(...). These can be updated mid-conversation with STTUpdateSettingsFrame. See Service Settings for details.
ParameterTypeDefaultDescription
modelstr | ModelModel.SMALL_STREAMINGMoonshine model architecture. Available models: TINY, BASE, TINY_STREAMING, BASE_STREAMING, SMALL_STREAMING (default), MEDIUM_STREAMING.
languageLanguage | strLanguage.ENLanguage for transcription. Moonshine supports English, Spanish, and other languages. The base language code is used (e.g., “en” from “en-US”). (Inherited from base STT settings.)

Usage

Basic Setup

from pipecat.services.moonshine.stt import MoonshineSTTService

stt = MoonshineSTTService()

With Custom Model

from pipecat.services.moonshine.stt import MoonshineSTTService, Model

stt = MoonshineSTTService(
    settings=MoonshineSTTService.Settings(
        model=Model.MEDIUM_STREAMING,
    ),
)

With Custom Language

from pipecat.services.moonshine.stt import MoonshineSTTService, Model
from pipecat.transcriptions.language import Language

stt = MoonshineSTTService(
    settings=MoonshineSTTService.Settings(
        model=Model.SMALL_STREAMING,
        language=Language.ES,
    ),
)

With Model as String

from pipecat.services.moonshine.stt import MoonshineSTTService

stt = MoonshineSTTService(
    settings=MoonshineSTTService.Settings(
        model="base",
    ),
)

Notes

  • First run downloads: The selected model downloads from the Moonshine model hub on first use and is cached locally. Later runs load it from the cache.
  • Segmented transcription: MoonshineSTTService extends SegmentedSTTService, meaning it processes complete audio segments after VAD detects the user has stopped speaking.
  • CPU-only: Moonshine runs efficiently on CPU via ONNX Runtime, so no GPU is required. This makes it ideal for resource-constrained environments.
  • Audio format: Expects 16-bit mono PCM audio at 16 kHz sample rate.
  • Model variants: The streaming-capable models (TINY_STREAMING, SMALL_STREAMING, MEDIUM_STREAMING) can be run in batch mode just like the non-streaming variants. The larger streaming models (SMALL_STREAMING, MEDIUM_STREAMING) are only available in streaming form.
  • Language support: Moonshine supports multiple languages (English, Spanish, and others). The service uses the base language code (e.g., “en” from “en-US”).
  • No external dependencies: Unlike API-based STT services, Moonshine requires no API keys or network connectivity after the initial model download.