Concordance.Dat
0.5.0
dotnet add package Concordance.Dat --version 0.5.0
NuGet\Install-Package Concordance.Dat -Version 0.5.0
<PackageReference Include="Concordance.Dat" Version="0.5.0" />
<PackageVersion Include="Concordance.Dat" Version="0.5.0" />
<PackageReference Include="Concordance.Dat" />
paket add Concordance.Dat --version 0.5.0
#r "nuget: Concordance.Dat, 0.5.0"
#:package Concordance.Dat@0.5.0
#addin nuget:?package=Concordance.Dat&version=0.5.0
#tool nuget:?package=Concordance.Dat&version=0.5.0
Concordance.DAT
Concordance.DAT is a high-performance, asynchronous streaming reader for Concordance DAT files, designed for .NET 8. It efficiently parses large legal discovery exports, supports robust encoding detection, and provides flexible handling of empty fields and multiline data. Built for reliability and speed in e-discovery and document management workflows.
Author: GPT5
Key behaviors
- Separator:
0x14(DC4) between fields - Quote / Qualifier:
0xFEaround every field - Escaped quote: doubled
0xFEwithin a quoted field - Multiline values:
CR,LF, orCRLFinside quotes are treated as data - Record terminators:
LForCRLFwhen not in quotes; a trailingCRat EOF is accepted - Header handling: the first record is the header and is not yielded; subsequent rows are dictionaries keyed by header names
- Validation: each data record must have the same number of fields as the header; otherwise a
FormatExceptionis thrown - Encoding: BOM-aware detection for UTF-8, UTF-16 LE, UTF-16 BE; BOM-less files must begin with
U+00FE - Stream positioning: after detection, if a BOM exists, the stream is positioned to the first character after the BOM
- Empty field handling: configurable via
EmptyFieldModeinDatFileOptions:Null(default): empty fields are included asnullvaluesKeep: empty fields are included as empty stringsOmit: empty fields are omitted from the output dictionary
Performance notes
- Uses
ArrayPool<char>for chunked decoding and reusesStringBuilder/Listinstances across records to minimize allocations. - Prefer opening files with
FileOptions.Asynchronous | FileOptions.SequentialScanfor best throughput. - Buffer sizes are specified in characters, not bytes (UTF-16 typically uses 2 bytes per char).
Quick start
Install your project reference (source or package), then:
using Concordance.Dat;
await foreach (var row in DatFile.ReadAsync("c:\\data\\export.dat"))
{
var docId = row["DOCID"];
// process...
}
Configuration with DatFileOptions
DatFileOptions centralizes buffer sizes, file-open behavior, and empty field handling. Defaults are tuned for good throughput; override as needed.
using Concordance.Dat;
var options = DatFileOptions.Default with
{
ReaderBufferChars = 256 * 1024, // StreamReader decode buffer (chars)
ParseChunkChars = 128 * 1024, // Parser working buffer (chars)
File = new FileStreamOptions
{
Mode = FileMode.Open,
Access = FileAccess.Read,
Share = FileShare.Read,
Options = FileOptions.Asynchronous | FileOptions.SequentialScan,
BufferSize = 1 << 20 // 1 MiB
},
EmptyFieldMode = EmptyField.Omit // Omit empty fields from output dictionary
};
var cancellationToken = CancellationToken.None;
await foreach (var row in DatFile.ReadAsync("c:\\data\\export.dat", options, cancellationToken))
{
// ...
}
You can also supply a Stream directly:
await using var fs = File.Open("c:\\data\\export.dat", new FileStreamOptions
{
Mode = FileMode.Open,
Access = FileAccess.Read,
Share = FileShare.Read,
Options = FileOptions.Asynchronous | FileOptions.SequentialScan,
BufferSize = 1 << 20
});
var options = DatFileOptions.Default with { EmptyFieldMode = EmptyField.Keep };
await foreach (var row in DatFile.ReadAsync(fs, options))
{
// ...
}
Reading only the header
To read just the header (field names) from a Concordance DAT file without streaming all records:
using Concordance.Dat;
// From a file path:
var header = await DatFile.GetHeaderAsync("c:\\data\\export.dat");
// From a stream:
await using var fs = File.OpenRead("c:\\data\\export.dat");
var header = await DatFile.GetHeaderAsync(fs);
Returns a read-only list of field names in file order. Throws FormatException if the file is empty or invalid.
Memory-Efficient File Validation
The GetCountAsync method provides a memory-efficient way to validate DAT files:
- Parses the entire file using the same exact rules as
ReadAsync - Returns both header fields and total row count in one pass
- Validates that every row has the correct number of fields
- Uses minimal memory since it doesn't create dictionaries for rows
Example usage for validation:
using Concordance.Dat;
try
{
var (header, rowCount) = await DatFile.GetCountAsync("large.dat");
Console.WriteLine($"File is valid with {header.Count} columns and {rowCount} rows");
}
catch (FormatException ex)
{
Console.WriteLine($"File validation failed: {ex.Message}");
}
API surface
public enum EmptyField
{
Null, // empty fields as null (default)
Keep, // empty fields as empty string
Omit // empty fields omitted from dictionary
}
public static class DatFile
{
// Path-based
public static IAsyncEnumerable<Dictionary<string, string>> ReadAsync(
string path,
DatFileOptions options = null,
CancellationToken cancellationToken = default);
// Stream-based
public static IAsyncEnumerable<Dictionary<string, string>> ReadAsync(
Stream stream,
DatFileOptions options = null,
CancellationToken cancellationToken = default);
// Header-only (path)
public static Task<IReadOnlyList<string>> GetHeaderAsync(
string path,
CancellationToken cancellationToken = default);
// Header-only (stream)
public static Task<IReadOnlyList<string>> GetHeaderAsync(
Stream stream,
CancellationToken cancellationToken = default);
// Count and validate (path)
public static Task<(IReadOnlyList<string> Header, long RowCount)> GetCountAsync(
string path,
CancellationToken cancellationToken = default);
// Count and validate (stream)
public static Task<(IReadOnlyList<string> Header, long RowCount)> GetCountAsync(
Stream stream,
CancellationToken cancellationToken = default);
}
public sealed record DatFileOptions
{
public int ReaderBufferChars { get; init; } = 128 * 1024;
public int ParseChunkChars { get; init; } = 128 * 1024;
public FileStreamOptions File { get; init; } = /* defaults to async + sequential scan, 1 MiB */;
public EmptyField EmptyFieldMode { get; init; } = EmptyField.Null;
public static DatFileOptions Default { get; }
}
Error handling
- Throws
FormatExceptionif a record's field count does not match the header. - Throws
FormatExceptionif the file does not begin (after optional BOM) with the requiredU+00FEquote character. - Honors
CancellationTokenduring async enumeration and header reading.
Created from JandaBox | Icon created by Freepik - Flaticon
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- No dependencies.
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.5.0 | 308 | 2/9/2026 |
| 0.4.1-main.1 | 45 | 2/9/2026 |
| 0.4.1-count-callback.3 | 140 | 11/15/2025 |
| 0.4.1-count-callback.2 | 1,384 | 10/29/2025 |
| 0.4.0 | 320 | 10/22/2025 |
| 0.3.1-row-count.4 | 218 | 10/22/2025 |
| 0.3.1-main.1 | 151 | 10/22/2025 |
| 0.3.0 | 208 | 9/30/2025 |
| 0.2.0 | 206 | 9/29/2025 |
| 0.1.0 | 302 | 9/24/2025 |
| 0.1.0-main.21 | 151 | 9/24/2025 |