Tabula 0.1.4-alpha001
See the version list below for details.
dotnet add package Tabula --version 0.1.4-alpha001
NuGet\Install-Package Tabula -Version 0.1.4-alpha001
<PackageReference Include="Tabula" Version="0.1.4-alpha001" />
paket add Tabula --version 0.1.4-alpha001
#r "nuget: Tabula, 0.1.4-alpha001"
// Install Tabula as a Cake Addin #addin nuget:?package=Tabula&version=0.1.4-alpha001&prerelease // Install Tabula as a Cake Tool #tool nuget:?package=Tabula&version=0.1.4-alpha001&prerelease
tabula-sharp
tabula-sharp
is a library for extracting tables from PDF files — it is a port of tabula-java
- Supports .NET 6, .NET Core 3.1, .NET Standard 2.0, .NET Framework 4.52, 4.6, 4.61, 4.62, 4.7
- No java bindings
NuGet packages available on the releases page and on www.nuget.org:
Differences with tabula-java
- Uses PdfPig, and not PdfBox.
- Coordinate system starts from the bottom left point (going up) of the page, and not from the top left point (going down).
- The
NurminenDetectionAlgorithm
is replaced bySimpleNurminenDetectionAlgorithm
, because it requieres an image management library. - Table results might be different because of the way PdfPig builds Letters bounding box.
Usage
Stream mode - BasicExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
// detect canditate table zones
SimpleNurminenDetectionAlgorithm detector = new SimpleNurminenDetectionAlgorithm();
var regions = detector.Detect(page);
IExtractionAlgorithm ea = new BasicExtractionAlgorithm();
List<Table> tables = ea.Extract(page.GetArea(regions[0].BoundingBox)); // take first candidate area
var table = tables[0];
var rows = table.Rows;
}
Lattice mode - SpreadsheetExtractionAlgorithm
using (PdfDocument document = PdfDocument.Open("doc.pdf", new ParsingOptions() { ClipPaths = true }))
{
ObjectExtractor oe = new ObjectExtractor(document);
PageArea page = oe.Extract(1);
IExtractionAlgorithm ea = new SpreadsheetExtractionAlgorithm();
List<Table> tables = ea.Extract(page);
var table = tables[0];
var rows = table.Rows;
}
Results
Stream mode - BasicExtractionAlgorithm
Lattice mode - SpreadsheetExtractionAlgorithm
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net5.0 was computed. net5.0-windows was computed. net6.0 is compatible. net6.0-android was computed. net6.0-ios was computed. net6.0-maccatalyst was computed. net6.0-macos was computed. net6.0-tvos was computed. net6.0-windows was computed. net7.0 was computed. net7.0-android was computed. net7.0-ios was computed. net7.0-maccatalyst was computed. net7.0-macos was computed. net7.0-tvos was computed. net7.0-windows was computed. net8.0 was computed. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. |
.NET Core | netcoreapp2.0 was computed. netcoreapp2.1 was computed. netcoreapp2.2 was computed. netcoreapp3.0 was computed. netcoreapp3.1 is compatible. |
.NET Standard | netstandard2.0 is compatible. netstandard2.1 was computed. |
.NET Framework | net452 is compatible. net46 is compatible. net461 is compatible. net462 is compatible. net463 was computed. net47 is compatible. net471 was computed. net472 was computed. net48 was computed. net481 was computed. |
MonoAndroid | monoandroid was computed. |
MonoMac | monomac was computed. |
MonoTouch | monotouch was computed. |
Tizen | tizen40 was computed. tizen60 was computed. |
Xamarin.iOS | xamarinios was computed. |
Xamarin.Mac | xamarinmac was computed. |
Xamarin.TVOS | xamarintvos was computed. |
Xamarin.WatchOS | xamarinwatchos was computed. |
-
.NETCoreApp 3.1
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETFramework 4.5.2
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETFramework 4.6
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETFramework 4.6.1
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETFramework 4.6.2
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETFramework 4.7
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
.NETStandard 2.0
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
-
net6.0
- PdfPig (>= 0.1.9-alpha-20231019-c6e2d)
NuGet packages (3)
Showing the top 3 NuGet packages that depend on Tabula:
Package | Downloads |
---|---|
Tabula.Json
Extract tables from PDF files (port of tabula-java using PdfPig). Json writer. |
|
Tabula.Csv
Extract tables from PDF files (port of tabula-java using PdfPig). Csv and Tsv writers. |
|
DocumentAtom.Pdf
DocumentAtom provides a light, fast library for breaking input PDF documents into constituent parts (atoms), useful for AI, machine learning, processing, analytics, and general analysis. |
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last updated |
---|---|---|
0.1.4 | 26,368 | 10/6/2024 |
0.1.4-alpha001 | 8,911 | 10/19/2023 |
0.1.3 | 214,451 | 6/1/2022 |
0.1.2 | 19,039 | 1/29/2022 |
0.1.1 | 10,574 | 7/18/2021 |
0.1.1-alpha001 | 523 | 3/6/2021 |
0.1.0 | 17,749 | 1/17/2021 |
0.1.0-alpha002 | 515 | 10/26/2020 |