PragmaticSegmenterNet 1.0.5
dotnet add package PragmaticSegmenterNet --version 1.0.5
NuGet\Install-Package PragmaticSegmenterNet -Version 1.0.5
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="PragmaticSegmenterNet" Version="1.0.5" />
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="PragmaticSegmenterNet" Version="1.0.5" />
<PackageReference Include="PragmaticSegmenterNet" />
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add PragmaticSegmenterNet --version 1.0.5
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
#r "nuget: PragmaticSegmenterNet, 1.0.5"
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package PragmaticSegmenterNet@1.0.5
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=PragmaticSegmenterNet&version=1.0.5
#tool nuget:?package=PragmaticSegmenterNet&version=1.0.5
The NuGet Team does not provide support for this client. Please contact its maintainers for support.
This project is a direct port of Pragmatic Segmenter which provides rule-based sentence boundary detection.
Usage
The Segmenter class provides the Segment method which in the simplest usage takes a string:
using PragmaticSegmenterNet;
IReadOnlyList<string> result = Segmenter.Segment("One Sentence. And another sentence.");
// ["One Sentence.", "And another sentence."]
IReadOnlyList<string> result2 = Segmenter.Segment("Anything.", Language.Italian);
// ["Anything"]
The Segment method has a number of optional parameters:
IReadOnlyList<string> Segment(string text, Language language = Language.English, bool cleanText = true, DocumentType documentType = DocumentType.Any)
- Language - An enum representing the supported languages, the default is English, see the supported languages list below for the list of currently supported languages.
- CleanText - A boolean indicating whether the input text should be cleaned prior to segmentation. Cleaning removes extra newlines and whitespace. Defaults to
true. - DocumentType - Used by the text cleaning process to determine which reformatting to apply. For PDFs this handles newlines in the middle of a sentence whereas for HTML documents this will handle HMTL tags. Defaults to any which does not apply any special formatting.
Languages
- English = 0 (default)
- Amharic = 1
- Arabic = 2
- Armenian = 3
- Bulgarian = 4
- Burmese = 5
- Chinese = 6
- Danish = 7
- Dutch = 8
- French = 9
- German = 10
- Greek = 11
- Hindi = 12
- Italian = 13
- Japanese = 14
- Kazakh = 15 (partial support, potentially only for the Cyrillic form of the alphabet)
- Persian = 16
- Polish = 17
- Russian = 18
- Spanish = 19
- Urdu = 20
Releases
1.0.5
- Fixes an issue with non-breaking spaces in numbered lists
1.0.3
- Fixes an issue with text containing regex replacement groups, e.g.
$0,$1, etc.
1.0.2
- Fixes an issue with periods following abbreviations.
1.0.1
- Fixes an issue with single character inputs.
Credit
This project wouldn't be possible without the work done by Pragmatic Segmenter team. Any bugs in the code are entirely my fault.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net5.0 was computed. net5.0-windows was computed. net6.0 was computed. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
| .NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 was computed. |
| .NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
| .NET Framework | net461 was computed. net462 was computed. net463 was computed. net47 was computed. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
| MonoAndroid | monoandroid was computed. |
| MonoMac | monomac was computed. |
| MonoTouch | monotouch was computed. |
| Tizen | tizen40 was computed. tizen60 was computed. |
| Xamarin.iOS | xamarinios was computed. |
| Xamarin.Mac | xamarinmac was computed. |
| Xamarin.TVOS | xamarintvos was computed. |
| Xamarin.WatchOS | xamarinwatchos was computed. |
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
-
.NETStandard 2.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.