SpiseMisu.Text.Dstring
0.11.23
dotnet add package SpiseMisu.Text.Dstring --version 0.11.23
NuGet\Install-Package SpiseMisu.Text.Dstring -Version 0.11.23
<PackageReference Include="SpiseMisu.Text.Dstring" Version="0.11.23" />
<PackageVersion Include="SpiseMisu.Text.Dstring" Version="0.11.23" />
<PackageReference Include="SpiseMisu.Text.Dstring" />
paket add SpiseMisu.Text.Dstring --version 0.11.23
#r "nuget: SpiseMisu.Text.Dstring, 0.11.23"
#:package SpiseMisu.Text.Dstring@0.11.23
#addin nuget:?package=SpiseMisu.Text.Dstring&version=0.11.23
#tool nuget:?package=SpiseMisu.Text.Dstring&version=0.11.23
SpiseMisu.Text.Dstring
A Danish string is a German string alike implementation for .NET, managed memory optimized.
A dstring consists of 16-bytes (128-bits) of continuous memory, where:
The firstbyte, stores abitmaskfor the seven nextbytesas well as abyte []pointerThe first
byte, uses a4-bitbitmaskto store the length of thedstringprefix, as well as another4-bitbitmaskto store flags forencoding-and-format. Once the upperbound length of thedstringprefix length is reached, a3-bitbitmaskwithcompressionflags is available:# Upperbound length of eight (compression flags are available) +--------+ |▭▭▭▭■□□□| +--------+ # Lenth of five (compression flags are NOT available) +--------+ |▭▭▭▭□■□■| +--------+and
# Encoding. Default is multiple single-byte UTF8 for optimal storage +--------+ |□□□□▭▭▭▭| UTF8.......: Encoded bytes as multiple UTF8 single-bytes +--------+ |□□□■▭▭▭▭| ASCII......: Encoded bytes in [0x00 - 0x7F] +--------+ |□□■□▭▭▭▭| ExtASCII...: Encoded bytes in [0x00 - 0xFF] +--------+ # Encoding and Format placeholders +--------+ |□□■■▭▭▭▭| PlaceholderF03 (placeholder for future formats/encodings) +--------+ |□■□□▭▭▭▭| PlaceholderF04 (placeholder for future formats/encodings) +--------+ |□■□■▭▭▭▭| PlaceholderF05 (placeholder for future formats/encodings) +--------+ |□■■□▭▭▭▭| PlaceholderF06 (placeholder for future formats/encodings) +--------+ |□■■■▭▭▭▭| PlaceholderF07 (placeholder for future formats/encodings) +--------+ |■□□□▭▭▭▭| PlaceholderF08 (placeholder for future formats/encodings) +--------+ |■□□■▭▭▭▭| PlaceholderF09 (placeholder for future formats/encodings) +--------+ |■□■□▭▭▭▭| PlaceholderF10 (placeholder for future formats/encodings) +--------+ |■□■■▭▭▭▭| PlaceholderF11 (placeholder for future formats/encodings) +--------+ |■■□□▭▭▭▭| PlaceholderF12 (placeholder for future formats/encodings) +--------+ |■■□■▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings) +--------+ |■■■□▭▭▭▭| PlaceholderF13 (placeholder for future formats/encodings) +--------+ # Format +--------+ |■■■■▭▭▭▭| JSON.......: Ex: [{"foo":42}] +--------+ bit-maskand
# Default is uncompressed +--------+ |▭▭▭▭■□□□| Uncompressed +--------+ # Compression algorithms, with streaming support +--------+ |▭▭▭▭■□□■| Deflate +--------+ |▭▭▭▭■□■□| GZip +--------+ |▭▭▭▭■□■■| ZLib +--------+ |▭▭▭▭■■□□| Brotli +--------+ # Compression algorithms placeholders +--------+ |▭▭▭▭■■□■| PlaceholderF05 +--------+ |▭▭▭▭■■■□| PlaceholderF06 +--------+ |▭▭▭▭■■■■| PlaceholderF07 +--------+ bit-maskThe next seven
bytes, store each of the seven firstbytesof adstring. If thedstringis less than sevenbytes, then the remainingbyteswill be instantiated to adefaultvalue of zeroFinally, the last
bytes, contain ax64-pointer(8-bytes) to abyte [](on theheap) for the rest of thebytesin thedstring. If thedstringis less than eightbytes, thebyte []will not be instantiated (nullvalue)
- Example of a 4-byte
dstring("test"). No heap allocation:
+--------+----+----+----+----+----+----+----+----------+
|□□□□□■□□|0x74|0x65|0x73|0x74|0x00|0x00|0x00| <NULL> |
+--------+----+----+----+----+----+----+----+----------+
bit-mask b0 b1 b2 b3 b4 b5 b6 pointer
—— —— —— ——
- Example of a +8-byte
dstring("Danish string") + heap allocation:
0x551A4290 (byte[] on heap)
|
v
+--------+----+----+---+----+----------+ +----+----+---+----+
|□□□□■□□□|0x44|0x61| … |0x20|0x551A4290| ---> |0x73|0x74| … |0x67|
+--------+----+----+---+----+----------+ +----+----+---+----+
bit-mask b0 b1 … b6 pointer b7 b8 … bn
—— —— —— ——————— —— —— ——
- Example of an array of nine
dstring:
extra allocated byte arrays on heap ----+------------+------------+
| | |
v | |
0x6796EE96 | |
+-+----+-----------------------+ | | |
|i|memo| continuous memory | v | |
+-+----+--------+---+----------+ +---+ v |
|0|0x00|□□□□■□□□| … |0x6796EE96| -----> | … | 0x53EB31F6 |
+-+----+--------+---+----------+ +---+ | |
|1|0x10|□□□□□□■□| … | <NULL> | v |
+-+----+--------+---+----------+ +---+ v
|2|0x20|□□□□■□□□| … |0x53EB31F6| ------------------> | … | 0x4A424B5E
+-+----+--------+---+----------+ +---+ |
|…|0x…0|□□□□□■□■| … | <NULL> | v
+-+----+--------+---+----------+ +---+
|8|0x80|□□□□■□□□| … |0x4A424B5E| -------------------------------> | … |
+-+----+--------+---+----------+ +---+
Project structure
├── SpiseMisu.Text.Dstring
│ ├── lib
│ │ └── utils.fs
│ ├── SpiseMisu.Text.Dstring.fsproj
│ └── dstring.fs
├── SpiseMisu.Text.Dstring.Perfs
│ ├── SpiseMisu.Text.Dstring.Perfs.fsproj
│ └── program.fs
├── SpiseMisu.Text.Dstring.Tests
│ ├── SpiseMisu.Text.Dstring.Tests.fsproj
│ ├── program.fs
│ └── tests.fs
├── demo
│ └── dstring.fsx
├── imgs
│ ├── docs
│ ├── licenses
│ └── nuget
├── SpiseMisu.Text.Dstring.sln
├── global.json
├── license.txt
├── license_cil-bytecode_agpl-3.0-only.txt
├── license_knowhow_cc-by-nc-nd-40.txt
├── readme.md
└── todo.org
Memory layout
Heap dump with dotnet-dump mini-guide
In
./SpiseMisu.Text.Dstring.Perfs/program.fs > x.GlobalCleanup () =outcommentSystem.Threading.Thread.Sleep(15_000 (* 15 secs *))Execute
./dotnet-cli-pidof.shand you will see all thedotnetapps running. Look for the ones ending withSpiseMisu.Text.Dstring.Perfs-Job-OVERNF-1/bin/Release/net10.0.Now wait for the job, you want to make the memory dump for, reaches the clean-up section:
// AfterActualRunExecute
dotnet-dump collect --type Heap --process-id 2456129and you will see:
// AfterActualRun
WorkloadResult 1: 2 op, 507459083.00 ns, 253.7295 ms/op
// GC: 8 7 0 207217488 2
// Threading: 0 0 2
[createdump] Gathering state for process 2456129 dotnet
[createdump] Writing minidump with heap to file ~/…/SpiseMisu.Text.Dstring/core_20251004_170724
[createdump] Written 596156416 bytes (145546 pages) to core file
[createdump] Target process is alive
[createdump] Dump successfully written in 306ms
Investigate by typing:
dotnet-dump analyze core_20251004_170724In the tool, type:
dumpheap -statand you will see:
…
561d22bacde0 13,565 539,936 Free
7f54cec830c0 1 8,000,024 System.Int64[]
7f54cec82ee8 1 16,000,024 SpiseMisu.Text+Dstring[]
7f54cec82010 2 16,000,048 System.Byte[][]
7f54ce9aeb48 34 24,004,640 System.String[]
7f54ce90d7c8 3,000,708 158,772,680 System.String
7f54ceb75950 5,000,005 209,002,292 System.Byte[]
Total 8,015,865 objects, 432,486,422 bytes
- See details for a given memory address:
dumpheap -mt 7f54cec82ee8
Address MT Size
7f14ce800048 7f54cec82ee8 16,000,024
- You can now drill further by typing:
dumparray -length 5 7f14ce800048
Name: SpiseMisu.Text+Dstring[]
MethodTable: 00007f54cec82ee8
EEClass: 00007f54cec82e60
Size: 16000024(0xf42418) bytes
Array: Rank 1, Number of elements 1000000, Type VALUETYPE
Element Methodtable: 00007f54cec82db0
[0] 00007f14ce800058
[1] 00007f14ce800068
[2] 00007f14ce800078
[3] 00007f14ce800088
[4] 00007f14ce800098
- And now we can see the contents of some of the (struct) elements in our
array by typing:
db -c 80 00007f14ce800058(16-byte element x 5 = 80-bytes):
00007f14ce800058: 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba 0k"......s..7...
00007f14ce800068: 58 6b 22 ce 14 7f 00 00 08 53 d1 20 a4 46 a1 86 Xk"......S. .F..
00007f14ce800078: 80 6b 22 ce 14 7f 00 00 08 44 8f d6 ea 76 37 34 .k"......D...v74
00007f14ce800088: a8 6b 22 ce 14 7f 00 00 08 5b c1 41 f8 f9 bd 58 .k"......[.A...X
00007f14ce800098: d0 6b 22 ce 14 7f 00 00 08 50 72 ef 42 a5 6a 2a .k"......Pr.B.j*
which show a similar pattern as the hex dumper (Dstring.Memory.dump):
0112748739DB99|00001000|↔|00007F536E755118|459055102CAE09F54B
01E606DBB4F6FA|00001000|↔|00007F536E754DD8|4BBC8ED0A25F0B8755
07BDEDF50B83AC|00001000|↔|00007F536E754DB0|43A0DFEEA191AEA2A3
0C5FB78013D42F|00001000|↔|00007F536E754CC0|41854A8815FE6E6A3C
1F3A8D9CC33F5E|00001000|↔|00007F536E7550F0|4BA36307910E82AB70
NOTE: In the performance
benchmarkGuid's are byte[]-reversed.
> 0112748739DB99|08|↔|00007F536E755118
(byte reversed becomes)
> 18 51 75 6E 53 7F 00 00|08|99 DB 39 87 74 12 01
(and compared to `dotnet-dump`)
< 30 6b 22 ce 14 7f 00 00 08 73 9a ac 37 c9 be ba
- Once you are done, clean the
core_[DATESTAMP]_[TIMESTAMP]files
Benchmarks
// * Summary *
BenchmarkDotNet v0.15.6, Linux NixOS 25.05 (Warbler)
12th Gen Intel Core i7-12800H 0.40GHz, 1 CPU, 20 logical and 14 physical cores
.NET SDK 10.0.100
[Host] : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3 DEBUG
Job-NTWEWU : .NET 10.0.0 (10.0.0, 10.0.25.52411), X64 RyuJIT x86-64-v3
Job=Job-NTWEWU Runtime=.NET 10.0 IterationCount=1
LaunchCount=0 WarmupCount=0 Error=NA
| Method | N | Mean | Ratio | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-------------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N' | 1000000 | 494.9 us | 1.00 | 7.63 MB | 1.00 |
| 'Array.zeroCreate<dstring> x.N' | 1000000 | 672.2 us | 1.36 | 15.26 MB | 2.00 |
| 'x.guids |> Array.map Encoding.ASCII.GetString' | 1000000 | 54,606.6 us | 110.33 | 61.04 MB | 8.00 |
| 'x.guids |> Array.map Dstring.Bytes.toDstring' | 1000000 | 51,448.0 us | 103.95 | 53.41 MB | 7.00 |
| 'x.sha256s |> Array.map Encoding.ASCII.GetString' | 1000000 | 72,909.0 us | 147.31 | 91.55 MB | 12.00 |
| 'x.sha256s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 60,189.2 us | 121.61 | 68.66 MB | 9.00 |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 75,732.9 us | 153.02 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 8,107.6 us | 16.38 | 15.26 MB | 2.00 |
| 'x.strings |> Array.sort' | 1000000 | 209,608.7 us | 423.51 | 7.63 MB | 1.00 |
| 'x.strings |> Array.sortDescending' | 1000000 | 238,639.4 us | 482.16 | 7.63 MB | 1.00 |
| 'x.strings |> Array.map Dstring.UTF8.fromString' | 1000000 | 130,051.8 us | 262.77 | 53.39 MB | 7.00 |
| 'x.dstrings |> Array.map Dstring.UTF8.toString' | 1000000 | 135,886.5 us | 274.55 | 98.69 MB | 12.94 |
| 'x.dstrings |> Dstrings.sort' | 1000000 | 168,288.2 us | 340.02 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortDescending' | 1000000 | 168,100.8 us | 339.64 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortPrefix' | 1000000 | 147,110.7 us | 297.23 | 15.26 MB | 2.00 |
| 'x.dstrings |> Dstrings.sortPrefixDescending' | 1000000 | 149,646.4 us | 302.36 | 15.26 MB | 2.00 |
// * Hints *
HideColumnsAnalyser
Summary -> Hidden columns: Error
// * Legends *
N : Value of the 'N' parameter
Mean : Arithmetic mean of all measurements
Ratio : Mean of the ratio distribution ([Current]/[Baseline])
Allocated : Allocated memory per single operation (managed only, inclusive, 1KB = 1024B)
Alloc Ratio : Allocated memory ratio distribution ([Current]/[Baseline])
1 us : 1 Microsecond (0.000001 sec)
NOTE: By adding
pinnedLastBytefordstringsthat are exactly8-bytein size, we minimize the amount of instantiatedbyte[]. Compare to previous approach (see below, before and after):
| Method | N | Mean | Ratio | Allocated | Alloc Ratio |
|-------------------------------------------------- |-------- |-----------:|-------:|----------:|------------:|
| 'Array.zeroCreate<string> x.N' | 1000000 | 2.419 ms | 1.00 | 7.63 MB | 1.00 |
| … | … | … | … | … | … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 84.291 ms | 34.85 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 51.341 ms | 21.23 | 45.78 MB | 6.00 |
| … | … | … | … | … | … |
| … | … | … | … | … | … |
| 'x.int64s |> Array.map Encoding.ASCII.GetString' | 1000000 | 83.702 ms | 28.04 | 45.78 MB | 6.00 |
| 'x.int64s |> Array.map Dstring.Bytes.toDstring' | 1000000 | 11.347 ms | 3.80 | 15.26 MB | 2.00 |
| … | … | … | … | … | … |
That would be a reduction
x5.5on compuation time andx3on (heap) memory allocation.
This will be really useful/helpful for when storing basic types, as for example:
DateTime;float64;int64/uint64; …, asdstrings.
Licenses
Source code in this repository is ONLY covered by a Server Side Public License, v 1 while the rest (knowhow, text, media, …), is covered by the
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.
However, as it's not permitted to deploy a nuget package with non OSI nor
FSF licenses:
Pushing SpiseMisu.Text.Dstring.0.11.0.nupkg to 'https://www.nuget.org/api/v2/package'...
PUT https://www.nuget.org/api/v2/package/
BadRequest https://www.nuget.org/api/v2/package/ 846ms
error: Response status code does not indicate success: 400 (License expression must only contain licenses that are approved by Open Source Initiative or Free Software Foundation. Unsupported licenses: SSPL-1.0.).
The CIL-bytecode content of the nuget package is therefore dual-licensed
under the GNU Affero General Public License v3.0 only and the
rest (knowhow, text, media, …), is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
license.
For more info on compatible nuget packages licenses, see SPDX License
List.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- FSharp.Core (>= 10.0.100)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.