NeuralCodecs 0.3.1
dotnet add package NeuralCodecs --version 0.3.1
NuGet\Install-Package NeuralCodecs -Version 0.3.1
<PackageReference Include="NeuralCodecs" Version="0.3.1" />
<PackageVersion Include="NeuralCodecs" Version="0.3.1" />
<PackageReference Include="NeuralCodecs" />
paket add NeuralCodecs --version 0.3.1
#r "nuget: NeuralCodecs, 0.3.1"
#addin nuget:?package=NeuralCodecs&version=0.3.1
#tool nuget:?package=NeuralCodecs&version=0.3.1
NeuralCodecs is a .NET library for neural audio codec implementations, designed for efficient audio compression and reconstruction.
Features
- SNAC: Multi-Scale Neural Audio Codec
- Support for multiple sampling rates: 24kHz, 32kHz, and 44.1kHz
- Attention mechanisms with adjustable window sizes for improved quality
- Automatic resampling for input flexibility
- DAC: Descript Audio Codec
- Supports multiple sampling rates: 16kHz, 24kHz, and 44.1kHz
- Configurable encoder/decoder architecture with variable rates
- Flexible bitrate configurations from 8kbps to 16kbps
- Encodec: Meta's Encodec neural audio compression
- Supports stereo audio at 24kHz and 48kHz sample rates
- Variable bitrate compression (1.5-24 kbps)
- Neural language model for enhanced compression quality
- Direct file compression to .ecdc format
- AudioTools: Advanced audio processing utilities
- Based on Descript's audiotools Python package
- Extended with .NET-specific optimizations and additional features
- Audio filtering, transformation, and effects processing
- Works with Descript's AudioSignal or Tensors
- Audio Visualization: Example project includes spectrogram generation and comparison tools
Requirements
- .NET 8.0 or later
- TorchSharp or libTorch compatible with your platform
- NAudio (for audio processing)
- SkiaSharp (for visualization features)
Usage
Creating/loading the model
There are several ways to load a model:
Using static factory method:
// Load SNAC model with static method provided for built-in models
var model = await NeuralCodecs.CreateSNACAsync("model.pt");
Using premade config:
SnacConfig provides premade configurations for 24kHz, 32kHz, and 44kHz sampling rates.
var model = await NeuralCodecs.CreateSNACAsync(modelPath, SNACConfig.SNAC24Khz);
Using IModelLoader instance with default config:
Allows the use of custom loader implementations
// Load model with default config from IModelLoader instance
var torchLoader = NeuralCodecs.CreateTorchLoader();
var model = await torchLoader.LoadModelAsync<SNAC, SNACConfig>("model.pt");
Using IModelLoader instance with custom config:
// For Encodec with custom bandwidth and settings
var encodecConfig = new EncodecConfig {
SampleRate = 48000,
Bandwidth = 12.0f,
Channels = 2, // Stereo audio
Normalize = true
};
var encodecModel = await torchLoader.LoadModelAsync<Encodec, EncodecConfig>("encodec_model.pt", encodecConfig);
Using factory method for custom models:
Allows the use of custom model implementations with built-in or custom loaders
// Load custom model with factory method
var model = await torchLoader.LoadModelAsync<CustomModel, CustomConfig>(
"model.pt",
config => new CustomModel(config, ...),
config);
Models can be loaded in Pytorch or Safetensors format.
AudioTools Features
The AudioTools namespace provides extensive audio processing capabilities:
var audio = new Tensor(...); // Load or create audio tensor
// Apply effects
var processedAudio = AudioEffects.ApplyCompressor(
audio,
sampleRate: 48000,
threshold: -20f,
ratio: 4.0f);
// Compute spectrograms and transforms
var spectrogram = DSP.MelSpectrogram(audio, sampleRate);
var stft = DSP.STFT(audio, windowSize: 1024, hopSize: 512, windowType: "hann");
Encoding and Decoding Audio
There are two main ways to process audio:
- Using the simplified ProcessAudio method:
// Compress audio in one step
var processedAudio = model.ProcessAudio(audioData, sampleRate);
- Using separate encode and decode steps:
// Encode audio to compressed format
var codes = model.Encode(buffer);
// Decode back to audio
var processedAudio = model.Decode(codes);
Saving the processed audio
Use your preferred method to save WAV files
// using NAudio
await using var writer = new WaveFileWriter(
outputPath,
new WaveFormat(model.Config.SamplingRate, channels: model.Channels)
);
writer.WriteSamples(processedAudio, 0, processedAudio.Length);
Encodec-Specific Features
Encodec provides additional capabilities:
// Set target bandwidth for compression (supported values depend on model)
encodecModel.SetTargetBandwidth(12.0f); // 12 kbps
// Get available bandwidth options
var availableBandwidths = encodecModel.TargetBandwidths; // e.g. [1.5, 3, 6, 12, 24]
// Use language model for enhanced compression quality
var lm = await encodecModel.GetLanguageModel();
// Apply LM during encoding/decoding for better quality
// Direct file compression
await EncodecCompressor.CompressToFileAsync(encodecModel, audioTensor, "audio.ecdc", useLm: true);
// Decompress from file
var (decompressedAudio, sampleRate) = await EncodecCompressor.DecompressFromFileAsync("audio.ecdc");
Acknowledgments
- SNAC - Original SNAC implementation
- Descript Audio Codec - DAC reference
- Encodec - Meta's neural audio codec
Contributing
Suggestions and contributions are welcome! Feel free to submit a pull request.
License
This project is licensed under the MIT License.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
-
net8.0
- NAudio (>= 2.2.1)
- TorchAudio (>= 0.105.0)
- TorchSharp (>= 0.105.0)
- TorchSharp.PyBridge (>= 1.4.3)
- TorchSharp-cuda-windows (>= 0.105.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.