Concordance.Dat 0.5.0

dotnet add package Concordance.Dat --version 0.5.0
                    
NuGet\Install-Package Concordance.Dat -Version 0.5.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Concordance.Dat" Version="0.5.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Concordance.Dat" Version="0.5.0" />
                    
Directory.Packages.props
<PackageReference Include="Concordance.Dat" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Concordance.Dat --version 0.5.0
                    
#r "nuget: Concordance.Dat, 0.5.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Concordance.Dat@0.5.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Concordance.Dat&version=0.5.0
                    
Install as a Cake Addin
#tool nuget:?package=Concordance.Dat&version=0.5.0
                    
Install as a Cake Tool

Concordance.DAT

Build NuGet

Concordance.DAT is a high-performance, asynchronous streaming reader for Concordance DAT files, designed for .NET 8. It efficiently parses large legal discovery exports, supports robust encoding detection, and provides flexible handling of empty fields and multiline data. Built for reliability and speed in e-discovery and document management workflows.

Author: GPT5


Key behaviors

  • Separator: 0x14 (DC4) between fields
  • Quote / Qualifier: 0xFE around every field
  • Escaped quote: doubled 0xFE within a quoted field
  • Multiline values: CR, LF, or CRLF inside quotes are treated as data
  • Record terminators: LF or CRLF when not in quotes; a trailing CR at EOF is accepted
  • Header handling: the first record is the header and is not yielded; subsequent rows are dictionaries keyed by header names
  • Validation: each data record must have the same number of fields as the header; otherwise a FormatException is thrown
  • Encoding: BOM-aware detection for UTF-8, UTF-16 LE, UTF-16 BE; BOM-less files must begin with U+00FE
  • Stream positioning: after detection, if a BOM exists, the stream is positioned to the first character after the BOM
  • Empty field handling: configurable via EmptyFieldMode in DatFileOptions:
    • Null (default): empty fields are included as null values
    • Keep: empty fields are included as empty strings
    • Omit: empty fields are omitted from the output dictionary

Performance notes

  • Uses ArrayPool<char> for chunked decoding and reuses StringBuilder/List instances across records to minimize allocations.
  • Prefer opening files with FileOptions.Asynchronous | FileOptions.SequentialScan for best throughput.
  • Buffer sizes are specified in characters, not bytes (UTF-16 typically uses 2 bytes per char).

Quick start

Install your project reference (source or package), then:

using Concordance.Dat;

await foreach (var row in DatFile.ReadAsync("c:\\data\\export.dat"))
{
    var docId = row["DOCID"];
    // process...
}

Configuration with DatFileOptions

DatFileOptions centralizes buffer sizes, file-open behavior, and empty field handling. Defaults are tuned for good throughput; override as needed.

using Concordance.Dat;

var options = DatFileOptions.Default with
{
    ReaderBufferChars = 256 * 1024,   // StreamReader decode buffer (chars)
    ParseChunkChars   = 128 * 1024,   // Parser working buffer (chars)
    File = new FileStreamOptions
    {
        Mode = FileMode.Open,
        Access = FileAccess.Read,
        Share = FileShare.Read,
        Options = FileOptions.Asynchronous | FileOptions.SequentialScan,
        BufferSize = 1 << 20          // 1 MiB
    },
    EmptyFieldMode = EmptyField.Omit // Omit empty fields from output dictionary
};

var cancellationToken = CancellationToken.None;

await foreach (var row in DatFile.ReadAsync("c:\\data\\export.dat", options, cancellationToken))
{
    // ...
}

You can also supply a Stream directly:

await using var fs = File.Open("c:\\data\\export.dat", new FileStreamOptions
{
    Mode = FileMode.Open,
    Access = FileAccess.Read,
    Share = FileShare.Read,
    Options = FileOptions.Asynchronous | FileOptions.SequentialScan,
    BufferSize = 1 << 20
});

var options = DatFileOptions.Default with { EmptyFieldMode = EmptyField.Keep };

await foreach (var row in DatFile.ReadAsync(fs, options))
{
    // ...
}

Reading only the header

To read just the header (field names) from a Concordance DAT file without streaming all records:

using Concordance.Dat;

// From a file path:
var header = await DatFile.GetHeaderAsync("c:\\data\\export.dat");

// From a stream:
await using var fs = File.OpenRead("c:\\data\\export.dat");
var header = await DatFile.GetHeaderAsync(fs);

Returns a read-only list of field names in file order. Throws FormatException if the file is empty or invalid.


Memory-Efficient File Validation

The GetCountAsync method provides a memory-efficient way to validate DAT files:

  • Parses the entire file using the same exact rules as ReadAsync
  • Returns both header fields and total row count in one pass
  • Validates that every row has the correct number of fields
  • Uses minimal memory since it doesn't create dictionaries for rows

Example usage for validation:

using Concordance.Dat;

try 
{
    var (header, rowCount) = await DatFile.GetCountAsync("large.dat");
    Console.WriteLine($"File is valid with {header.Count} columns and {rowCount} rows");
}
catch (FormatException ex)
{
    Console.WriteLine($"File validation failed: {ex.Message}");
}

API surface

public enum EmptyField
{
    Null, // empty fields as null (default)
    Keep, // empty fields as empty string
    Omit  // empty fields omitted from dictionary
}

public static class DatFile
{
    // Path-based
    public static IAsyncEnumerable<Dictionary<string, string>> ReadAsync(
        string path,
        DatFileOptions options = null,
        CancellationToken cancellationToken = default);

    // Stream-based
    public static IAsyncEnumerable<Dictionary<string, string>> ReadAsync(
        Stream stream,
        DatFileOptions options = null,
        CancellationToken cancellationToken = default);

    // Header-only (path)
    public static Task<IReadOnlyList<string>> GetHeaderAsync(
        string path,
        CancellationToken cancellationToken = default);

    // Header-only (stream)
    public static Task<IReadOnlyList<string>> GetHeaderAsync(
        Stream stream,
        CancellationToken cancellationToken = default);

    // Count and validate (path)
    public static Task<(IReadOnlyList<string> Header, long RowCount)> GetCountAsync(
        string path,
        CancellationToken cancellationToken = default);

    // Count and validate (stream)
    public static Task<(IReadOnlyList<string> Header, long RowCount)> GetCountAsync(
        Stream stream,
        CancellationToken cancellationToken = default);
}

public sealed record DatFileOptions
{
    public int ReaderBufferChars { get; init; } = 128 * 1024;
    public int ParseChunkChars   { get; init; } = 128 * 1024;
    public FileStreamOptions File { get; init; } = /* defaults to async + sequential scan, 1 MiB */;
    public EmptyField EmptyFieldMode { get; init; } = EmptyField.Null;
    public static DatFileOptions Default { get; }
}

Error handling

  • Throws FormatException if a record's field count does not match the header.
  • Throws FormatException if the file does not begin (after optional BOM) with the required U+00FE quote character.
  • Honors CancellationToken during async enumeration and header reading.

Created from JandaBox | Icon created by Freepik - Flaticon

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.
  • net8.0

    • No dependencies.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.5.0 308 2/9/2026
0.4.1-main.1 45 2/9/2026
0.4.1-count-callback.3 140 11/15/2025
0.4.1-count-callback.2 1,384 10/29/2025
0.4.0 320 10/22/2025
0.3.1-row-count.4 218 10/22/2025
0.3.1-main.1 151 10/22/2025
0.3.0 208 9/30/2025
0.2.0 206 9/29/2025
0.1.0 302 9/24/2025
0.1.0-main.21 151 9/24/2025