SpiseMisu.Text.Dstring 0.11.23

dotnet add package SpiseMisu.Text.Dstring --version 0.11.23
                    
NuGet\Install-Package SpiseMisu.Text.Dstring -Version 0.11.23
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="SpiseMisu.Text.Dstring" Version="0.11.23" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="SpiseMisu.Text.Dstring" Version="0.11.23" />
                    
Directory.Packages.props
<PackageReference Include="SpiseMisu.Text.Dstring" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add SpiseMisu.Text.Dstring --version 0.11.23
                    
#r "nuget: SpiseMisu.Text.Dstring, 0.11.23"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package SpiseMisu.Text.Dstring@0.11.23
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=SpiseMisu.Text.Dstring&version=0.11.23
                    
Install as a Cake Addin
#tool nuget:?package=SpiseMisu.Text.Dstring&version=0.11.23
                    
Install as a Cake Tool

SpiseMisu.Text.Dstring

A Danish string is a German string alike implementation for .NET, managed memory optimized.

A dstring consists of 16-bytes (128-bits) of continuous memory, where:

  • The first byte, stores a bitmask for the seven next bytes as well as a byte [] pointer

  • The first byte, uses a 4-bit bitmask to store the length of the dstring prefix, as well as another 4-bit bitmask to store flags for encoding-and-format. Once the upperbound length of the dstring prefix length is reached, a 3-bit bitmask with compression flags is available:

    # Upperbound length of eight (compression flags are available)
    +--------+
    |▭▭▭▭■□□□|
    +--------+
    # Lenth of five (compression flags are NOT available)
    +--------+
    |▭▭▭▭□■□■|
    +--------+
    

    and

    # Encoding. Default is multiple single-byte UTF8 for optimal storage
    +--------+
    |□□□□▭▭▭▭| UTF8.......: Encoded bytes as multiple UTF8 single-bytes
    +--------+
    |□□□■▭▭▭▭| ASCII......: Encoded bytes in [0x00 - 0x7F]
    +--------+
    |□□■□▭▭▭▭| ExtASCII...: Encoded bytes in [0x00 - 0xFF]
    +--------+
    # Encoding and Format placeholders
    +--------+
    |□□■■▭▭▭▭| PlaceholderF03 (placeholder for future formats/encodings)
    +--------+
    |□■□□▭▭▭▭| PlaceholderF04 (placeholder for future formats/encodings)
    +--------+
    |□■□■▭▭▭▭| PlaceholderF05 (placeholder for future formats/encodings)
    +--------+
    |□■■□▭▭▭▭| PlaceholderF06 (placeholder for future formats/encodings)
    +--------+
    |□■■■▭▭▭▭| PlaceholderF07 (placeholder for future formats/encodings)
    +--------+
    |■□□□▭▭▭▭| PlaceholderF08 (placeholder for future formats/encodings)
    +--------+
    |■□□■▭▭▭▭| PlaceholderF09 (placeholder for future formats/encodings)
    +--------+
    |■□■□▭▭▭▭| PlaceholderF10 (placeholder for future formats/encodings)
    +--------+
    |■□■■▭▭▭▭| PlaceholderF11 (placeholder for future formats/encodings)
    +--------+
    |■■□□▭▭▭▭| PlaceholderF12 (placeholder for future formats/encodings)
    +--------+
    |■■□■▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings)
    +--------+
    |■■■□▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings)
    +--------+
    # Format
    +--------+
    |■■■■▭▭▭▭| JSON.......: Ex: [{"foo":42}]
    +--------+
    bit-mask
    

    and

    # Default is uncompressed
    +--------+
    |▭▭▭▭■□□□| Uncompressed
    +--------+
    # Compression algorithms, with streaming support
    +--------+
    |▭▭▭▭■□□■| Deflate
    +--------+
    |▭▭▭▭■□■□| GZip
    +--------+
    |▭▭▭▭■□■■| ZLib
    +--------+
    |▭▭▭▭■■□□| Brotli
    +--------+
    # Compression algorithms placeholders
    +--------+
    |▭▭▭▭■■□■| PlaceholderF05
    +--------+
    |▭▭▭▭■■■□| PlaceholderF06
    +--------+
    |▭▭▭▭■■■■| PlaceholderF07
    +--------+
    bit-mask
    
  • The next seven bytes, store each of the seven first bytes of a dstring. If the dstring is less than seven bytes, then the remaining bytes will be instantiated to a default value of zero

  • Finally, the last bytes, contain a x64-pointer (8-bytes) to a byte [] (on the heap) for the rest of the bytes in the dstring. If the dstring is less than eight bytes, the byte [] will not be instantiated (null value)

  1. Example of a 4-byte dstring ("test"). No heap allocation:
+--------+----+----+----+----+----+----+----+----------+
|□□□□□■□□|0x74|0x65|0x73|0x74|0x00|0x00|0x00|  <NULL>  |
+--------+----+----+----+----+----+----+----+----------+
 bit-mask  b0   b1   b2   b3   b4   b5   b6    pointer
           ——   ——   ——   ——
  1. Example of a +8-byte dstring ("Danish string") + heap allocation:
                                         0x551A4290 (byte[] on heap)
                                              |
                                              v
+--------+----+----+---+----+----------+      +----+----+---+----+
|□□□□■□□□|0x44|0x61| … |0x20|0x551A4290| ---> |0x73|0x74| … |0x67|
+--------+----+----+---+----+----------+      +----+----+---+----+
 bit-mask  b0   b1   …   b6    pointer          b7   b8   …   bn
           ——   ——       ——    ———————          ——   ——       ——
  1. Example of an array of nine dstring:
extra allocated byte arrays on heap ----+------------+------------+
                                        |            |            |
                                        v            |            |
                                   0x6796EE96        |            |
+-+----+-----------------------+        |            |            |
|i|memo|   continuous memory   |        v            |            |
+-+----+--------+---+----------+        +---+        v            |
|0|0x00|□□□□■□□□| … |0x6796EE96| -----> | … |   0x53EB31F6        |
+-+----+--------+---+----------+        +---+        |            |
|1|0x10|□□□□□□■□| … |  <NULL>  |                     v            |
+-+----+--------+---+----------+                     +---+        v
|2|0x20|□□□□■□□□| … |0x53EB31F6| ------------------> | … |   0x4A424B5E
+-+----+--------+---+----------+                     +---+        |
|…|0x…0|□□□□□■□■| … |  <NULL>  |                                  v
+-+----+--------+---+----------+                                  +---+
|8|0x80|□□□□■□□□| … |0x4A424B5E| -------------------------------> | … |
+-+----+--------+---+----------+                                  +---+

Project structure

├── SpiseMisu.Text.Dstring
│   ├── lib
│   │   └── utils.fs
│   ├── SpiseMisu.Text.Dstring.fsproj
│   └── dstring.fs
├── SpiseMisu.Text.Dstring.Perfs
│   ├── SpiseMisu.Text.Dstring.Perfs.fsproj
│   └── program.fs
├── SpiseMisu.Text.Dstring.Tests
│   ├── SpiseMisu.Text.Dstring.Tests.fsproj
│   ├── program.fs
│   └── tests.fs
├── demo
│   └── dstring.fsx
├── imgs
│   ├── docs
│   ├── licenses
│   └── nuget
├── SpiseMisu.Text.Dstring.sln
├── global.json
├── license.txt
├── license_cil-bytecode_agpl-3.0-only.txt
├── license_knowhow_cc-by-nc-nd-40.txt
├── readme.md
└── todo.org

Memory layout

Figure: dstring[] hex-dump

Heap dump with dotnet-dump mini-guide

  1. In ./SpiseMisu.Text.Dstring.Perfs/program.fs > x.GlobalCleanup () = outcomment System.Threading.Thread.Sleep(15_000 (* 15 secs *))

  2. Execute ./dotnet-cli-pidof.sh and you will see all the dotnet apps running. Look for the ones ending with SpiseMisu.Text.Dstring.Perfs-Job-OVERNF-1/bin/Release/net10.0.

  3. Now wait for the job, you want to make the memory dump for, reaches the clean-up section: // AfterActualRun

  4. Execute dotnet-dump collect --type Heap --process-id 2456129 and you will see:

// AfterActualRun
WorkloadResult   1: 2 op, 507459083.00 ns, 253.7295 ms/op
// GC:  8 7 0 207217488 2
// Threading:  0 0 2
 
[createdump] Gathering state for process 2456129 dotnet
[createdump] Writing minidump with heap to file ~/…/SpiseMisu.Text.Dstring/core_20251004_170724
[createdump] Written 596156416 bytes (145546 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 306ms
  1. Investigate by typing: dotnet-dump analyze core_20251004_170724

  2. In the tool, type: dumpheap -stat and you will see:

…
561d22bacde0    13,565     539,936 Free
7f54cec830c0         1   8,000,024 System.Int64[]
7f54cec82ee8         1  16,000,024 SpiseMisu.Text+Dstring[]
7f54cec82010         2  16,000,048 System.Byte[][]
7f54ce9aeb48        34  24,004,640 System.String[]
7f54ce90d7c8 3,000,708 158,772,680 System.String
7f54ceb75950 5,000,005 209,002,292 System.Byte[]
Total 8,015,865 objects, 432,486,422 bytes
  1. See details for a given memory address: dumpheap -mt 7f54cec82ee8
         Address               MT           Size
    7f14ce800048     7f54cec82ee8     16,000,024
  1. You can now drill further by typing: dumparray -length 5 7f14ce800048
Name:        SpiseMisu.Text+Dstring[]
MethodTable: 00007f54cec82ee8
EEClass:     00007f54cec82e60
Size:        16000024(0xf42418) bytes
Array:       Rank 1, Number of elements 1000000, Type VALUETYPE
Element Methodtable: 00007f54cec82db0
[0] 00007f14ce800058
[1] 00007f14ce800068
[2] 00007f14ce800078
[3] 00007f14ce800088
[4] 00007f14ce800098
  1. And now we can see the contents of some of the (struct) elements in our array by typing: db -c 80 00007f14ce800058 (16-byte element x 5 = 80-bytes):
00007f14ce800058: 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba  0k"......s..7...
00007f14ce800068: 58 6b 22 ce 14 7f 00 00 08 53 d1 20 a4 46 a1 86  Xk"......S. .F..
00007f14ce800078: 80 6b 22 ce 14 7f 00 00 08 44 8f d6 ea 76 37 34  .k"......D...v74
00007f14ce800088: a8 6b 22 ce 14 7f 00 00 08 5b c1 41 f8 f9 bd 58  .k"......[.A...X
00007f14ce800098: d0 6b 22 ce 14 7f 00 00 08 50 72 ef 42 a5 6a 2a  .k"......Pr.B.j*

which show a similar pattern as the hex dumper (Dstring.Memory.dump):

0112748739DB99|00001000|↔|00007F536E755118|459055102CAE09F54B
01E606DBB4F6FA|00001000|↔|00007F536E754DD8|4BBC8ED0A25F0B8755
07BDEDF50B83AC|00001000|↔|00007F536E754DB0|43A0DFEEA191AEA2A3
0C5FB78013D42F|00001000|↔|00007F536E754CC0|41854A8815FE6E6A3C
1F3A8D9CC33F5E|00001000|↔|00007F536E7550F0|4BA36307910E82AB70

NOTE: In the performance benchmark Guid's are byte[]-reversed.

> 0112748739DB99|08|↔|00007F536E755118
  (byte reversed becomes)
> 18 51 75 6E 53 7F 00 00|08|99 DB 39 87 74 12 01
  (and compared to `dotnet-dump`)
< 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba
  1. Once you are done, clean the core_[DATESTAMP]_[TIMESTAMP] files

Benchmarks

// * Summary *

BenchmarkDotNet v0.15.6, Linux NixOS 25.05 (Warbler)
12th Gen Intel Core i7-12800H 0.40GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.100
  [Host]     : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3 DEBUG
  Job-NTWEWU : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3

Job=Job-NTWEWU  Runtime=.NET 10.0  IterationCount=1  
LaunchCount=0  WarmupCount=0  Error=NA  

| Method                                            | N       | Mean         | Ratio  | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-------------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N'                    | 1000000 |     494.9 us |   1.00 |   7.63 MB |        1.00 |
| 'Array.zeroCreate<dstring> x.N'                   | 1000000 |     672.2 us |   1.36 |  15.26 MB |        2.00 |
| 'x.guids |> Array.map Encoding.ASCII.GetString'   | 1000000 |  54,606.6 us | 110.33 |  61.04 MB |        8.00 |
| 'x.guids |> Array.map Dstring.Bytes.toDstring'    | 1000000 |  51,448.0 us | 103.95 |  53.41 MB |        7.00 |
| 'x.sha256s |> Array.map Encoding.ASCII.GetString' | 1000000 |  72,909.0 us | 147.31 |  91.55 MB |       12.00 |
| 'x.sha256s |> Array.map Dstring.Bytes.toDstring'  | 1000000 |  60,189.2 us | 121.61 |  68.66 MB |        9.00 |
| 'x.int64s |> Array.map Encoding.ASCII.GetString'  | 1000000 |  75,732.9 us | 153.02 |  45.78 MB |        6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring'   | 1000000 |   8,107.6 us |  16.38 |  15.26 MB |        2.00 |
| 'x.strings |> Array.sort'                         | 1000000 | 209,608.7 us | 423.51 |   7.63 MB |        1.00 |
| 'x.strings |> Array.sortDescending'               | 1000000 | 238,639.4 us | 482.16 |   7.63 MB |        1.00 |
| 'x.strings |> Array.map Dstring.UTF8.fromString'  | 1000000 | 130,051.8 us | 262.77 |  53.39 MB |        7.00 |
| 'x.dstrings |> Array.map Dstring.UTF8.toString'   | 1000000 | 135,886.5 us | 274.55 |  98.69 MB |       12.94 |
| 'x.dstrings |> Dstrings.sort'                     | 1000000 | 168,288.2 us | 340.02 |  15.26 MB |        2.00 |
| 'x.dstrings |> Dstrings.sortDescending'           | 1000000 | 168,100.8 us | 339.64 |  15.26 MB |        2.00 |
| 'x.dstrings |> Dstrings.sortPrefix'               | 1000000 | 147,110.7 us | 297.23 |  15.26 MB |        2.00 |
| 'x.dstrings |> Dstrings.sortPrefixDescending'     | 1000000 | 149,646.4 us | 302.36 |  15.26 MB |        2.00 |

// * Hints *
HideColumnsAnalyser
  Summary -> Hidden columns: Error

// * Legends *
  N           : Value of the 'N' parameter
  Mean        : Arithmetic mean of all measurements
  Ratio       : Mean of the ratio distribution ([Current]/[Baseline])
  Allocated   : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
  Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
  1 us        : 1 Microsecond (0.000001 sec)

NOTE: By adding pinnedLastByte for dstrings that are exactly 8-byte in size, we minimize the amount of instantiated byte[]. Compare to previous approach (see below, before and after):

| Method                                            | N       | Mean       | Ratio  | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-----------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N'                    | 1000000 |   2.419 ms |   1.00 |   7.63 MB |        1.00 |
| …                                                 |      …  |          … |      … |         … |           … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString'  | 1000000 |  84.291 ms |  34.85 |  45.78 MB |        6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring'   | 1000000 |  51.341 ms |  21.23 |  45.78 MB |        6.00 |
| …                                                 |      …  |          … |      … |         … |           … |
| …                                                 |      …  |          … |      … |         … |           … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString'  | 1000000 |  83.702 ms |  28.04 |  45.78 MB |        6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring'   | 1000000 |  11.347 ms |   3.80 |  15.26 MB |        2.00 |
| …                                                 |      …  |          … |      … |         … |           … |

That would be a reduction x5.5 on compuation time and x3 on (heap) memory allocation.

This will be really useful/helpful for when storing basic types, as for example: DateTime; float64; int64/uint64; …, as dstrings.

Licenses

Source code in this repository is ONLY covered by a Server Side Public License, v 1 while the rest (knowhow, text, media, …), is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

Figure: CC BY-NC-ND 4.0

However, as it's not permitted to deploy a nuget package with non OSI nor FSF licenses:

Pushing SpiseMisu.Text.Dstring.0.11.0.nupkg to 'https://www.nuget.org/api/v2/package'...
  PUT https://www.nuget.org/api/v2/package/
  BadRequest https://www.nuget.org/api/v2/package/ 846ms
error: Response status code does not indicate success: 400 (License expression must only contain licenses that are approved by Open Source Initiative or Free Software Foundation. Unsupported licenses: SSPL-1.0.).

The CIL-bytecode content of the nuget package is therefore dual-licensed under the GNU Affero General Public License v3.0 only and the rest (knowhow, text, media, …), is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

For more info on compatible nuget packages licenses, see SPDX License List.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
0.11.23 418 11/18/2025
0.11.22 127 10/25/2025
0.11.21 120 10/25/2025
0.11.20 181 10/22/2025
0.11.19 161 10/17/2025
0.11.18 185 10/8/2025
0.11.17 195 10/7/2025
0.11.16 180 10/7/2025
0.11.15 179 10/5/2025