EDMTranslator 0.9.0

Additional Details

These packages are pre-alpha version. Please use 0.9.1 or above.

There is a newer version of this package available.
See the version list below for details.
dotnet add package EDMTranslator --version 0.9.0                
NuGet\Install-Package EDMTranslator -Version 0.9.0                
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="EDMTranslator" Version="0.9.0" />                
For projects that support PackageReference, copy this XML node into the project file to reference the package.
paket add EDMTranslator --version 0.9.0                
#r "nuget: EDMTranslator, 0.9.0"                
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
// Install EDMTranslator as a Cake Addin
#addin nuget:?package=EDMTranslator&version=0.9.0

// Install EDMTranslator as a Cake Tool
#tool nuget:?package=EDMTranslator&version=0.9.0                

EDMTranslator

Text translator library based on LLM models, especially EncoderDecoderModel in HuggingFace

Nuget Package list

Package repo description
EDMTranslator Nuget EDMTranslator Main library

Requirements

  • .NET 6 or above

Supported models

Quickstart

Following guide supposes that you are to use FF14Translator mentioned above.

Install the packages

  1. From the NuGet, install EDMTranslator package
  2. And then, install Tokenizers.DotNet.runtime.win package too

Prepare the required data

Japanese dictionary

Fine-tuned translator model

Implement the driver code

Write the code like below and you are good to go 🫡 Note that you need to fix the path of encoderDictDir and modelDir correctly.

 // Console application which translates Japanese sentence to Korean based on FF14Translator

using EDMTranslator.Tokenization;
using EDMTranslator.Translation;

// Prepare the tokenizer
var encoderVocabPath = await BertJapaneseTokenizer.HuggingFace.GetVocabFromHub("tohoku-nlp/bert-base-japanese-v2");
var hubName = "skt/kogpt2-base-v2";
var decoderVocabFilename = "tokenizer.json";
var decoderVocabPath = await Tokenizers.DotNet.HuggingFace.GetFileFromHub(hubName, decoderVocabFilename, "deps");

string encoderDictDir = @"D:\DATASET\unidic-mecab-2.1.2_bin";
var tokenizer = new BertJa2KoGPTTokenizer(
    encoderDictDir: encoderDictDir, encoderVocabPath: encoderVocabPath,
    decoderVocabPath: decoderVocabPath);

void TestTokenizer(ITokenizer tokenizer)
{
    Console.WriteLine("--Tokenizer test--");
    Console.WriteLine("[Encode]");
    var sentenceJa = "打ち合わせが終わった後にご飯を食べましょう。";
    Console.WriteLine($"Input: {sentenceJa}");
    var (embeddingsJa, attentionMask) = tokenizer.Encode(sentenceJa);
    Console.WriteLine($"Encoded: {string.Join(", ", embeddingsJa)}");

    Console.WriteLine("[Decode]");
    // Tokens of "음, 이제 식사도 해볼까요"
    var tokens = new uint[] { 9330, 387, 12857, 9376, 18649, 9098, 7656, 6969, 8084, 1 };
    Console.WriteLine($"Input: {string.Join(", ", tokens)}");
    var decoded = tokenizer.Decode(tokens);
    Console.WriteLine($"Decoded: {decoded}");
}
TestTokenizer(tokenizer);

// Prepare the translator
string modelDir = @"D:\MODEL\ffxiv-ja-ko-translator\onnx"; // Contains encoder_model.onnx and decoder_model_merged.onnx
var translator = new FF14Translator(tokenizer, modelDir);
void TestTranslator(FF14Translator translator)
{
    Console.WriteLine("--Translator test--");
    Translate(translator, "打ち合わせが終わった後にご飯を食べましょう。");
    Translate(translator, "試験前に緊張したあまり、熱がでてしまった。");
    Translate(translator, "山田は英語にかけてはクラスの誰にも負けない。");
    Translate(translator, "この本によれば、最初の人工橋梁は新石器時代にさかのぼるという。");
}
TestTranslator(translator);

static void Translate(FF14Translator translator, string sentence)
{
    Console.WriteLine($"SourceText: {sentence}");
    string translated = translator.Translate(sentence);
    Console.WriteLine($"Translated: {translated}");
}

How to build

  1. Prepare following stuff:
    1. .NET build system (dotnet 6.0)
    2. PowerShell (Recommend 7.4.2 or above)
  2. Run cbuild.ps1

The build artifact will be saved in nuget directory.

Product Compatible and additional computed target framework versions.
.NET net6.0 is compatible.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.