ManySpeech.AliFsmnVad 1.1.3

dotnet add package ManySpeech.AliFsmnVad --version 1.1.3
                    
NuGet\Install-Package ManySpeech.AliFsmnVad -Version 1.1.3
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="ManySpeech.AliFsmnVad" Version="1.1.3" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="ManySpeech.AliFsmnVad" Version="1.1.3" />
                    
Directory.Packages.props
<PackageReference Include="ManySpeech.AliFsmnVad" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add ManySpeech.AliFsmnVad --version 1.1.3
                    
#r "nuget: ManySpeech.AliFsmnVad, 1.1.3"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package ManySpeech.AliFsmnVad@1.1.3
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=ManySpeech.AliFsmnVad&version=1.1.3
                    
Install as a Cake Addin
#tool nuget:?package=ManySpeech.AliFsmnVad&version=1.1.3
                    
Install as a Cake Tool

AliFsmnVad

AliFsmnVad是一个用于解码Fsmn-Vad模型的C#库,用于语音活动检测。

简介:

AliFsmnVad是一款基于C#开发的Fsmn-Vad模型解码库,通过调用Microsoft.ML.OnnxRuntime实现对ONNX模型的高效解码。该库兼容性优异,支持net461+、.NET6.0+等框架版本,且支持跨平台编译及AOT编译,部署灵活便捷。其语音端点检测(VAD)整体流程的实时因子(RTF)约为0.008,性能高效。

作为16k通用VAD工具,它基于达摩院语音团队提出的FSMN-Monophone VAD高效模型,核心功能是精准检测长语音片段中有效语音的起止时间点。通过提取有效音频片段并输入识别引擎,可显著减少无效语音带来的识别误差,提升语音识别任务的准确性。

VAD常用参数调整说明(参考:vad.yaml文件):

max_end_silence_time:尾部连续检测到多长时间静音进行尾点判停,参数范围500ms~6000ms,默认值800ms(该值过低容易出现语音提前截断的情况)。 speech_noise_thres:speech的得分减去noise的得分大于此值则判断为speech,参数范围:(-1,1) 取值越趋于-1,噪音被误判定为语音的概率越大,FA越高 取值越趋于+1,语音被误判定为噪音的概率越大,Pmiss越高 通常情况下,该值会根据当前模型在长语音测试集上的效果取balance

调用方式:
1.添加项目引用
using AliFsmnVad;
using AliFsmnVad.Model;
2.初始化模型和配置
string applicationBase = AppDomain.CurrentDomain.BaseDirectory;
string modelName = "speech_fsmn_vad_zh-cn-16k-common-onnx";
string modelFilePath = applicationBase + "./"+ modelName + "/model.onnx";
string configFilePath = applicationBase + "./"+ modelName + "/vad.yaml";
string mvnFilePath = applicationBase + "./"+ modelName + "/vad.mvn";
int batchSize = 2;//批量解码
AliFsmnVad aliFsmnVad = new AliFsmnVad(modelFilePath, configFilePath, mvnFilePath, batchSize);
3.调用

方法一(适用于小文件):

SegmentEntity[] segments_duration = aliFsmnVad.GetSegments(samples);

方法二(适用于大文件):

SegmentEntity[] segments_duration = aliFsmnVad.GetSegmentsByStep(samples);
4.获取结果
//遍历segments_duration
foreach (SegmentEntity segment in segments_duration)
{
    //segment.Waveform 是vad切分后的语音片段sample
	//segment.Segment 是每一段语音对应的时间戳
}

输出相应时间戳:

load model and init config elapsed_milliseconds:463.5390625
vad infer result:
[[70,2340][2620,6200][6480,23670][23950,26250][26780,28990][29950,31430][31750,37600][38210,46900][47310,49630][49910,56460][56740,59540][59820,70450]]
elapsed_milliseconds:662.796875
total_duration:70470.625
rtf:0.009405292985552491

输出的数据,例如:[70,2340],是以毫秒为单位的segement的起止时间,可以以此为依据对音频进行分片。其中静音噪音部分已被去除。

语音识别

将segment.Waveform(见上述4)作为参数,调用 AliParaformerAsr 、K2TransducerAsr、SherpaOnnxSharp的offlineRecognizer的相应方法执行语音识别任务。具体请参考相应示例。

其他说明:

测试用例:AliFsmnVad.Examples。 支持平台: Windows 7 SP1或更高版本, macOS 10.13 (High Sierra) 或更高版本,ios等, Linux 发行版(需要特定的依赖关系,详见.NET 6支持的Linux发行版列表), Android(Android 5.0 (API 21) 或更高版本)。 示例中计算音频samples:NAudio库。

模型下载

https://huggingface.co/manyeyes/speech_fsmn_vad_zh-cn-16k-common-onnx https://www.modelscope.cn/models/manyeyes/alifsmnvad-onnx

官方模型介绍: https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-onnx

参考:

https://github.com/modelscope/FunASR

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 is compatible.  net8.0-android was computed.  net8.0-android34.0 is compatible.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-ios18.0 is compatible.  net8.0-maccatalyst was computed.  net8.0-maccatalyst18.0 is compatible.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net8.0-windows10.0.19041 is compatible.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 is compatible. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 is compatible. 
.NET Framework net461 is compatible.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 is compatible.  net48 is compatible.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages (1)

Showing the top 1 NuGet packages that depend on ManySpeech.AliFsmnVad:

Package Downloads
ManySpeech.MoonshineAsr

MoonshineAsr is a c# library for decoding moonshine's tiny, base Models,used in speech recognition (ASR), which uses Microsoft.ML.OnnxRuntime to decode the ONNX model at the bottom layer, This library boasts excellent compatibility in terms of framework adaptation, supporting multiple environments such as net461+, net60+, netcoreapp3.1, and netstandard2.0+. It supports cross-platform compilation as well as AOT compilation, and is simple and convenient to use.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.1.3 190 8/24/2025
1.1.2 169 8/24/2025
1.1.1 143 8/14/2025
1.1.0 148 8/14/2025
1.0.9 153 8/11/2025
1.0.8 135 8/11/2025
1.0.7 136 8/11/2025
1.0.6 126 8/9/2025
1.0.5 300 6/10/2025
1.0.4 268 5/13/2025