Tokenizers.DotNet
0.9.2
Additional Details
This version of library is compatible with runtime 0.6.1, but not the later versions.
Use 1.0.1 from now.
There is a newer version of this package available.
See the version list below for details.
See the version list below for details.
dotnet add package Tokenizers.DotNet --version 0.9.2
NuGet\Install-Package Tokenizers.DotNet -Version 0.9.2
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Tokenizers.DotNet" Version="0.9.2" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Tokenizers.DotNet" Version="0.9.2" />
<PackageReference Include="Tokenizers.DotNet" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Tokenizers.DotNet --version 0.9.2
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: Tokenizers.DotNet, 0.9.2"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Tokenizers.DotNet@0.9.2
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Tokenizers.DotNet&version=0.9.2
#tool nuget:?package=Tokenizers.DotNet&version=0.9.2
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
Tokenizers.DotNet
.NET wrapper of HuggingFace Tokenizers library
Nuget Package list
Package | main | Description |
---|---|---|
Tokenizers.DotNet | Core library | |
Tokenizers.DotNet.runtime.win | Native bindings for windows x64 |
Requirements
- .NET 6 or above
Supported functionalities
- Download tokenizer files from Hugginface Hub
- Load tokenizer file(
.json
) from local - Decode embeddings to string
How to use
(1) Install the packages
- From the NuGet, install
Tokenizers.DotNet
package - And then, install
Tokenizers.DotNet.runtime.win
package too
(2) Write the code
Check following example code:
using Tokenizers.DotNet;
// Download skt/kogpt2-base-v2/tokenizer.json from the hub
var hubName = "skt/kogpt2-base-v2";
var filePath = "tokenizer.json";
var fileFullPath = await HuggingFace.GetFileFromHub(hubName, filePath, "deps");
Console.WriteLine($"Downloaded {fileFullPath}");
// Create a tokenizer instance
var tokenizer = new Tokenizer(vocabPath: fileFullPath);
var tokens = new uint[] { 9330, 387, 12857, 9376, 18649, 9098, 7656, 6969, 8084, 1 };
var decoded = tokenizer.Decode(tokens);
Console.WriteLine($"Decoded: {decoded}");
Console.WriteLine($"Version of Tokenizers.DotNet.runtime.win: {tokenizer.GetVersion()}");
How to build
- Prepare following stuff:
- Rust build system (
cargo
) - .NET build system (
dotnet 6.0
) - PowerShell (Recommend
7.4.2
or above)
- Rust build system (
- Run
build_all_clean.ps1
- To build
Tokenizers.DotNet.runtime.win
only, runbuild_rust.ps1
- To build
Tokenizers.DotNet
only, runbuild_dotnet.ps1
- To build
Each build artifacts will be in nuget
directory.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 is compatible. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
net6.0
- No dependencies.
-
net7.0
- No dependencies.
-
net8.0
- No dependencies.
NuGet packages (2)
Showing the top 2 NuGet packages that depend on Tokenizers.DotNet:
Package | Downloads |
---|---|
EDMTranslator
Text translator library based on LLM models, especially EncoderDecoderModel in HuggingFace |
|
RAGamuffin
Package Description |
GitHub repositories
This package is not used by any popular GitHub repositories.