XiaomiTTS.Core
1.0.0
dotnet add package XiaomiTTS.Core --version 1.0.0
NuGet\Install-Package XiaomiTTS.Core -Version 1.0.0
<PackageReference Include="XiaomiTTS.Core" Version="1.0.0" />
<PackageVersion Include="XiaomiTTS.Core" Version="1.0.0" />
<PackageReference Include="XiaomiTTS.Core" />
paket add XiaomiTTS.Core --version 1.0.0
#r "nuget: XiaomiTTS.Core, 1.0.0"
#:package XiaomiTTS.Core@1.0.0
#addin nuget:?package=XiaomiTTS.Core&version=1.0.0
#tool nuget:?package=XiaomiTTS.Core&version=1.0.0
XiaomiTTS
A .NET 9 command-line tool for Xiaomi MiMo TTS (Text-to-Speech) synthesis. Converts text into WAV audio files by calling the MiMo platform REST API via HTTP/HTTPS.
Features
- High-quality speech synthesis powered by the
mimo-v2-ttsmodel - Supports both non-streaming and streaming (SSE) modes
- Real-time audio chunk reception in streaming mode, ideal for long text synthesis
- Generates standard WAV audio files (24 kHz / 16-bit / Mono)
- Native AOT compilation support for zero-dependency single-file executables (~5.6 MB)
- Real-time progress display, updated every second
- Structured logging system powered by Serilog with multi-sink output
- Comprehensive exception handling, error logging, and graceful error reporting
- Dependency Injection architecture following .NET Generic Host patterns
- Environment-based configuration with hot-reload support
Tech Stack
Backend Language & Runtime
| Technology | Version | Role |
|---|---|---|
| C# | 12.0 | Primary language for TTS client, CLI parsing, and audio processing |
| .NET SDK | 9.0 | Build framework, tooling, and AOT compilation support |
| .NET Runtime | 9.0 | Application runtime environment with built-in GC and JIT |
| .NET Generic Host | 9.0.0 | Application lifecycle management, DI container, and configuration |
Frameworks & Libraries
| Technology | Version | Role |
|---|---|---|
| Microsoft.Extensions.Hosting | 9.0.0 | Generic Host builder, DI container, and app lifecycle orchestration |
| Microsoft.Extensions.Http | 9.0.0 | HttpClient factory and HTTP client management |
| Microsoft.Extensions.Configuration | 9.0.0 | JSON-based configuration with environment variable overrides |
| Microsoft.Extensions.Options | 9.0.0 | Strongly-typed options pattern for TTS configuration |
| Microsoft.Extensions.Logging | 9.0.0 | Logging abstraction layer for structured logging |
| System.Text.Json | Built-in | JSON serialization/deserialization with AOT-compatible source generators |
| System.Net.Http | Built-in | HttpClient for HTTP/HTTPS requests to the MiMo TTS API |
| System.IO | Built-in | Async file I/O operations, WAV file generation and validation |
Logging & Observability
| Technology | Version | Role |
|---|---|---|
| Serilog | 4.2.0 | Structured logging core engine with rich property support |
| Serilog.Extensions.Hosting | 9.0.0 | Seamless integration with .NET Generic Host logging pipeline |
| Serilog.Settings.Configuration | 9.0.0 | Drives Serilog configuration from appsettings.json files |
| Serilog.Sinks.Console | 6.0.0 | Console output sink with customizable formatting templates |
| Serilog.Sinks.Debug | 3.0.0 | Debug output sink for IDE debugger output window |
| Serilog.Sinks.File | 6.0.0 | Rolling file sink with daily rotation, size limits, and retention policies |
| Serilog.Sinks.Seq | 8.0.0 | Seq distributed log aggregation platform sink |
| Serilog.Enrichers.Thread | 4.0.0 | Enriches log events with thread ID and thread name |
| Serilog.Enrichers.Environment | 3.0.1 | Enriches log events with machine name and environment user name |
| Serilog.Expressions | 5.0.0 | Expression engine for structured log filtering and formatting |
Third-Party APIs & Services
| Technology | Version | Role |
|---|---|---|
| MiMo TTS API | mimo-v2-tts | Xiaomi speech synthesis REST API, core TTS engine |
| SSE (Server-Sent Events) | — | Streaming protocol for real-time audio chunk delivery |
Testing & Quality
| Technology | Version | Role |
|---|---|---|
| xunit | 2.9.2 | Unit testing framework with parameterized test support |
| Microsoft.NET.Test.Sdk | 17.12.0 | .NET test runner infrastructure and test discovery |
| xunit.runner.visualstudio | 3.0.0 | Visual Studio / VS Code test adapter for xunit |
| Moq | 4.20.72 | Mocking framework for dependency isolation in unit tests |
| coverlet.collector | 6.0.2 | Code coverage data collector (Cobertura format) |
| coverlet.msbuild | 6.0.2 | MSBuild-integrated code coverage instrumentation |
Build & Toolchain
| Technology | Version | Role |
|---|---|---|
| MSBuild / dotnet CLI | 9.0 | Project build, restore, publish, and dependency management |
| Native AOT | — | Compiles .NET apps to native machine code for zero-dependency deployment |
| PowerShell | 5.1+ | Test runner scripts and CI/CD automation |
| Visual Studio | 2022 17.14+ | IDE support; Windows AOT requires C++ desktop workload |
| VS Code + C# Dev Kit | — | Cross-platform IDE support with IntelliSense and debugging |
Project Structure
XiaomiTTS/
├── XiaomiTTS.sln # Visual Studio solution file
├── README.md # English documentation
├── README-ZhCn.md # Chinese documentation
├── LICENSE # MIT License
│
├── XiaomiTTS.Core/ # Core business logic class library
│ ├── XiaomiTTS.Core.csproj
│ ├── Abstractions/
│ │ ├── IFileSystem.cs # File system abstraction interface
│ │ └── FileSystem.cs # File system implementation
│ ├── Audio/
│ │ └── WavProcessor.cs # WAV header validation & PCM-to-WAV conversion
│ ├── Configuration/
│ │ └── TtsOptions.cs # Strongly-typed TTS configuration model
│ ├── DependencyInjection/
│ │ └── ServiceCollectionExtensions.cs # DI service registration extensions
│ ├── Interfaces/
│ │ ├── ITtsClient.cs # TTS client interface (Dispose pattern)
│ │ ├── ISynthesisStrategy.cs # Synthesis strategy interface (Strategy pattern)
│ │ └── IWavProcessor.cs # WAV processor interface
│ ├── Models/
│ │ ├── ProgressInfo.cs # Progress reporting data model
│ │ └── Tts/
│ │ ├── TtsRequest.cs # TTS API request model
│ │ ├── TtsMessage.cs # Chat message model
│ │ ├── TtsAudioOptions.cs # Audio format configuration model
│ │ └── TtsJsonContext.cs # AOT-compatible JSON source generator
│ └── Services/
│ ├── TtsClient.cs # TTS client facade (strategy delegation)
│ ├── NonStreamingSynthesisStrategy.cs # Non-streaming synthesis implementation
│ └── StreamingSynthesisStrategy.cs # Streaming (SSE) synthesis implementation
│
├── XiaomiTTS.Cli/ # Command-line application
│ ├── XiaomiTTS.Cli.csproj
│ ├── Program.cs # Entry point (async Main, DI setup, orchestration)
│ ├── Cli/
│ │ ├── CommandLineParser.cs # CLI argument parser with validation
│ │ └── ProgressDisplay.cs # Real-time console progress display
│ ├── Logging/
│ │ ├── SerilogConfiguration.cs # Serilog logger factory with multi-environment support
│ │ ├── TraceIdEnricher.cs # Custom enricher for distributed trace ID
│ │ └── ConfigurationChangeWatcher.cs # Hot-reload configuration change monitor
│ ├── Models/
│ │ └── CommandLineArgs.cs # Parsed command-line arguments model
│ ├── appsettings.json # Default configuration (TTS + Serilog)
│ ├── appsettings.Development.json # Development environment overrides
│ └── appsettings.Production.json # Production environment configuration
│
├── XiaomiTTS.Tests/ # Unit test project
│ ├── XiaomiTTS.Tests.csproj
│ ├── CommandLineParserTests.cs # CLI argument parsing tests
│ ├── TtsClientTests.cs # TTS client initialization tests
│ ├── NonStreamingSynthesisStrategyTests.cs # Non-streaming synthesis tests
│ ├── StreamingSynthesisStrategyTests.cs # Streaming synthesis tests
│ ├── WavProcessorTests.cs # WAV header validation & PCM conversion tests
│ ├── ModelsTests.cs # Data model tests
│ └── LoggingTests.cs # Logging configuration tests
│
├── sample_input.txt # Chinese sample input text
├── sample_input_en.txt # English sample input text
├── sample_long.txt # Long text sample
├── sample_song.txt # Lyrics sample
├── sample_style.txt # Style sample
├── run-tests.ps1 # PowerShell test runner with coverage support
├── spec/
│ └── text2voice.md # Project requirement specification
├── skill/
│ └── xiaomi-local-tts/
│ └── SKILL.md # AI-assisted skill configuration
└── .trae/
├── rules/
│ └── unittesting.md # Unit testing guidelines
└── skills/
└── xiaomi-tts/
└── skill.md # AI skill definition
Environment Requirements
Required Dependencies
| Dependency | Minimum Version | Notes |
|---|---|---|
| .NET SDK | 9.0 | Required for building and running the project. Download |
| MiMo API Key | — | Set via the MIMO_API_KEY environment variable; required for TTS synthesis |
Optional Dependencies (Native AOT Publishing)
| Dependency | Platform | Notes |
|---|---|---|
| Visual Studio 2022 | Windows | Must include the "Desktop development with C++" workload |
| clang | Linux | Required for Native AOT compilation on Linux |
| zlib1g-dev | Linux | Required for Native AOT compression support on Linux |
Getting Started
1. Clone the Repository
git clone https://github.com/your-username/XiaomiTTS.git
cd XiaomiTTS
2. Set Environment Variables
Windows (PowerShell):
$env:MIMO_API_KEY = "your-api-key-here"
Windows (CMD):
set MIMO_API_KEY=your-api-key-here
Linux / macOS:
export MIMO_API_KEY="your-api-key-here"
To make the variable persistent, add it to your shell profile (e.g., ~/.bashrc, ~/.zshrc) or use User Secrets for development.
3. Build the Project
dotnet build
4. Run the Program
# Basic usage (non-streaming mode)
dotnet run -- project:XiaomiTTS.Cli -- input.txt
# Specify output filename
dotnet run -- project:XiaomiTTS.Cli -- input.txt output.wav
# Streaming mode (recommended for long text)
dotnet run -- project:XiaomiTTS.Cli -- input.txt --stream
# Streaming mode with output filename
dotnet run -- project:XiaomiTTS.Cli -- input.txt output.wav --stream
# Use sample text
dotnet run -- project:XiaomiTTS.Cli -- sample_input.txt
5. Run Tests
# Run all tests
dotnet test XiaomiTTS.Tests/XiaomiTTS.Tests.csproj
# Run tests with code coverage
.\run-tests.ps1 -Coverage
# Run tests with coverage and open the report
.\run-tests.ps1 -Coverage -OpenReport
# Run tests with filter
.\run-tests.ps1 -Filter "FullyQualifiedName~CommandLineParser"
6. Native AOT Publish (Optional)
# Windows x64
dotnet publish XiaomiTTS.Cli/XiaomiTTS.Cli.csproj -c Release -r win-x64 --self-contained true
# Linux x64
dotnet publish XiaomiTTS.Cli/XiaomiTTS.Cli.csproj -c Release -r linux-x64 --self-contained true
# macOS x64
dotnet publish XiaomiTTS.Cli/XiaomiTTS.Cli.csproj -c Release -r osx-x64 --self-contained true
Published executables are located in XiaomiTTS.Cli/bin/Release/net9.0/<RID>/publish/ (~5.6 MB), with no .NET Runtime dependency required on the target machine.
CLI Arguments
Positional Arguments
| Position | Required | Description | Default |
|---|---|---|---|
| 1 | Yes | Input text file path (UTF-8 encoded) | — |
| 2 | No | Output audio file path | Same name as input with .wav extension |
Named Arguments
| Argument | Description |
|---|---|
--stream |
Enable streaming mode for real-time audio chunk reception via SSE |
--help, -h |
Show help information |
API Reference
Non-Streaming Mode
- Endpoint:
https://api.xiaomimimo.com/v1/chat/completions - Model:
mimo-v2-tts - Auth:
Authorization: Bearer <API_KEY> - Request Body:
{
"model": "mimo-v2-tts",
"messages": [
{"role": "user", "content": "Convert the following text to speech"},
{"role": "assistant", "content": "Text content to synthesize"}
],
"voice": "mimo_default",
"format": "wav"
}
- Response: JSON containing
choices[0].message.audio.data(Base64-encoded audio)
Streaming Mode
- Endpoint: Same as above
- Request Body:
{
"model": "mimo-v2-tts",
"messages": [
{"role": "user", "content": "Convert the following text to speech"},
{"role": "assistant", "content": "Text content to synthesize"}
],
"audio": {
"format": "pcm16",
"voice": "mimo_default"
},
"stream": true
}
- Response: SSE stream; each event contains
choices[0].delta.audio.data(Base64-encoded PCM16 data) - Audio Parameters: 24 kHz, 16-bit, Mono PCM
Exit Codes
| Code | Description |
|---|---|
| 0 | Success |
| 1 | Error (missing env var, file not found, API failure, validation failure, etc.) |
Development Guidelines
- The project uses C# 12 syntax with nullable reference types and implicit usings enabled
- JSON serialization uses
System.Text.Jsonsource generators (TtsJsonContext) for AOT compatibility - All asynchronous operations follow the
async/awaitpattern withCancellationTokensupport for timeouts and cancellation - Architecture follows the Strategy Pattern for synthesis modes and Dependency Injection via .NET Generic Host
- Configuration is managed through
IOptions<T>pattern with environment-based overrides and hot-reload support - File system operations are abstracted behind
IFileSystemfor testability - All public types and methods include XML documentation comments
- Unit tests cover CLI parsing, client initialization, synthesis strategy logic, WAV processing, and logging configuration
Common Issues & Troubleshooting
Q: "未找到环境变量 MIMO_API_KEY" / "MIMO_API_KEY not found"
Ensure the environment variable is set before running the program. Verify with:
# Linux / macOS
echo $MIMO_API_KEY
# Windows PowerShell
echo $env:MIMO_API_KEY
Q: Build error "SDK 'Microsoft.NET.Sdk' not found"
Install .NET 9 SDK: https://dotnet.microsoft.com/download/dotnet/9.0
Q: Native AOT publish fails (Windows)
Ensure Visual Studio 2022 is installed with the "Desktop development with C++" workload selected. The C++ toolchain is required for native code compilation.
Q: Audio generated in streaming mode cannot be played
Streaming mode outputs PCM16 format, which the program automatically wraps into a standard WAV file. If the file cannot be played:
- Check if the audio data from the API was received completely (look for timeout errors in logs)
- Verify the output file size is larger than the WAV header size (44 bytes)
- Try a shorter text input to rule out timeout issues
Q: Program exits with timeout (exit code 1)
The default timeout is 200 seconds. For very long texts, use streaming mode (--stream) to avoid timeouts. Streaming mode receives audio incrementally and is more resilient for large inputs.
Q: How to enable Seq log aggregation?
- Start a Seq instance (default:
http://localhost:5341) - Set
"Seq": { "Enabled": true }inappsettings.jsonor via environment variables:$env:Seq__Enabled = "true" $env:Seq__ServerUrl = "http://localhost:5341"
Q: How to change log levels at runtime?
Modify appsettings.json while the application is running — the ConfigurationChangeWatcher detects changes and hot-reloads the Serilog minimum level without restarting the process.
References
- MiMo Platform API Documentation
- .NET 9 SDK Download
- Native AOT Deployment
- Serilog Documentation
- xunit Testing Framework
License
This project is licensed under the MIT License.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net9.0
- Microsoft.Extensions.Hosting (>= 9.0.0)
- Microsoft.Extensions.Http (>= 9.0.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.0.0 | 40 | 5/29/2026 |