Fuuga.cpu 1.0.1

.NET 10.0

dotnet add package Fuuga.cpu --version 1.0.1

NuGet\Install-Package Fuuga.cpu -Version 1.0.1

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Fuuga.cpu" Version="1.0.1" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Fuuga.cpu" Version="1.0.1" />
                    

                            Directory.Packages.props

<PackageReference Include="Fuuga.cpu" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Fuuga.cpu --version 1.0.1

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Fuuga.cpu, 1.0.1"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Fuuga.cpu@1.0.1

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Fuuga.cpu&version=1.0.1
                    

                            Install as a Cake Addin

#tool nuget:?package=Fuuga.cpu&version=1.0.1
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Fuuga

Tired of paying tokens? Think you could train a better model? Well, now you can try.

An LLM built from scratch in F# and .NET. Fuuga implements a complete language model pipeline: tokenization, data ingestion, model training, fine-tuning, and text generation -- with no Python dependencies.

Built on TorchSharp for tensor operations and Microsoft.ML.Tokenizers for BPE, Fuuga uses idiomatic F# (discriminated unions, pipelines, immutability) throughout. Works on GPU or CPU.

Features

Core Pipeline:

BPE Tokenizer -- Train a byte-pair encoding tokenizer on your own corpus with configurable vocabulary size
Data Ingestion -- Discover and tokenize epub, markdown, Parquet, and plain text files into a binary corpus
Parquet I/O -- Read and write HuggingFace-compatible Parquet datasets for SFT, DPO, and document data
Corpus Compression -- Zstd compression/decompression for .fuge corpus files
GPT-2 Transformer -- Decoder-only causal transformer with rotary position embeddings (RoPE), grouped-query attention (GQA), RMSNorm, and SwiGLU activation
Multi-Head Latent Attention (MLA) -- DeepSeek-V2 style compressed KV cache with query/KV compression, decoupled RoPE keys, and optional weight absorption for reduced memory during inference
Vision Encoder -- Vision model support for multimodal inputs
Vision Bridge -- Q-Former cross-attention bridge that compresses vision patch tokens into learned query vectors for multimodal (image+text) inputs
Paged Attention -- Paged KV-cache attention for efficient memory usage during long-context generation
Memory Hierarchy -- Compressed memory with external retrieval for extended context
Multi-Resolution Attention -- Chunk pooling with global tokens for efficient long-context processing
FlashAttention Config -- SDPA backend selection and benchmarking for attention kernels
Auto Config -- Hardware-aware auto-resolution of DU configuration cases (norm, activation, precision, offloading, communication) at startup
Early Exit -- Adaptive depth inference for faster generation when confidence is high
Training -- AdamW optimizer with cosine learning rate scheduling, warmup, gradient clipping, mixed precision support, and gradient accumulation
INT8 Optimizer Moments -- Optional INT8 quantization of AdamW M/V moment tensors with per-row symmetric quantization, reducing optimizer memory ~4× (--moment-quant int8)
Optimizer Variants -- Stochastic Weight Averaging (SWA) and Lookahead optimizer support with checkpointable optimizer state
Gradient Checkpointing -- Memory-efficient training via activation recomputation
GPU Offloading -- Layer-wise CPU/GPU offloading for reduced VRAM usage
Optimizer Offloading -- Offload optimizer states to CPU memory
NVMe Paging -- ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management for training models larger than available VRAM
Per-Tensor Gradient Offloading -- Bulk-copy gradients to CPU after backward pass and restore before optimizer step, freeing GPU VRAM during the optimizer phase (--grad-offload)
Per-Parameter CUDA Flush -- Aggressive CUDA cache cleanup after optimizer step to reclaim transient VRAM spikes from M/V update temporaries (--flush-each-param)
VRAM Guard -- In-process background thread that polls GPU memory via nvidia-smi and signals the training loop to warn, skip batches, or abort when usage exceeds a configurable threshold (--vram-guard-gb <float>)
Memory Strategy Presets -- MemoryStrategyConfigs.none, .constrained (grad-offload + flush), and .full (all three with VRAM guard at 95%) with automatic ConfigWizard recommendations based on model-to-VRAM ratio
Model Parallelism -- Tensor and pipeline parallelism configuration for 70B+ parameter models, with automatic DataParallel recommendation for multi-GPU setups
Inference -- Greedy, top-k, top-p (nucleus), and temperature sampling with repetition penalty
Fill-in-the-Middle -- FIM support with prefix/suffix tokens for code completion
Checkpoints -- Save, load, resume training from checkpoints with full metadata; safetensors format support
Memory-Mapped Loading -- mmap-based model loading for fast startup
Streaming Inference -- Token-by-token generation with configurable stop conditions
Confidence Signals -- Entropy, repetition rate, hedging detection, calibrated confidence with Platt scaling, and stop reason reporting
Drift Detection -- Statistical drift monitoring (Kolmogorov-Smirnov, Population Stability Index) over confidence signals with ring-buffered accumulation
Drift Alerting -- Dual-threshold alerts with adaptive sigma-based thresholds, OpenTelemetry metrics, and retraining triggers
ONNX Export -- Export to ONNX format with fp16/int8 quantization, validation, and benchmarking
ONNX Inference -- ONNX Runtime backend for optimized inference (--backend onnx)
Benchmark Evaluation -- Built-in benchmark runner for MMLU, HellaSwag, ARC-Challenge, cached dataset downloads, and checkpoint-attached benchmark results
FP8 Dequantization -- FP8 format support for quantized weight loading with GPU-accelerated LUT path (256-entry cached lookup table using torch.index_select) auto-selected when CUDA is available
Validation Pipeline -- Input validation framework with composable validators
Scaling Heuristics -- Auto-scaling configuration from corpus and hardware stats
Config Wizard -- Corpus analysis and hardware-aware config generation using Chinchilla scaling laws, activation memory estimates, NTK-aware RoPE, multi-GPU detection (nvidia-smi), and memory strategy recommendations
CLI -- Subcommands for the full pipeline (tokenize, ingest, train, infer, info, sft, dpo, rl, merge, transfer, distill, merge-models, fisher, distributed, export onnx, compress, decompress, prune, eval, config, wordnet, serve, orchestrate, agent)

Fine-Tuning:

Supervised Fine-Tuning (SFT) -- LoRA-based fine-tuning on instruction/chat JSONL data with configurable rank, alpha, and target modules
Prompt Tuning -- Soft-prompt / virtual-token fine-tuning with frozen base weights for lightweight PEFT workflows
Direct Preference Optimization (DPO) -- Preference learning from chosen/rejected pairs with LoRA
Reinforcement Learning (RL) -- REINFORCE++ / GRPO fine-tuning with pluggable reward functions
Reward Functions -- Composable reward functions for RL training (correctness, formatting, safety)
LoRA Adapter Merging -- Merge trained LoRA adapters back into the base model weights
QLoRA -- NF4-quantized base weights with LoRA adapters for memory-efficient fine-tuning on consumer GPUs
Data Validation -- JSONL format validation for SFT and DPO datasets with honesty pattern classification
Data Augmentation -- Synonym replacement, rule-based paraphrasing, token-level noise injection, and SFT/DPO oversampling for training data diversity

Weight Transfer and Model Merging:

Weight Transfer -- Transfer weights from donor models with architecture-aware mapping (Phi-3, LLaMA3, DeepSeek-V3 dense FFN) and dimension adaptation for mismatched tensors
Knowledge Distillation -- Token-level, sequence-level, and reverse-KLD distillation from a teacher model
Attention Distillation -- MLA-to-GQA distillation that trains grouped-query attention layers to reproduce frozen Multi-head Latent Attention teacher outputs (DeepSeek-V3/Kimi K2) with KL-divergence + MSE loss
N-ary Model Merging -- Merge multiple models with configurable strategies (TIES, DARE, Karcher mean, ModelSoups, ModelStock) and EWC protection
Fisher Information -- Compute diagonal Fisher information matrices for Elastic Weight Consolidation
N-ary Data Mixing -- Weighted data source mixing for diverse training corpora

Inference Capabilities:

Chain-of-Thought -- Thinking mode with ThinkStart/ThinkEnd token handling and dimmed thinking display
Constrained Decoding -- Grammar-guided JSON structured output generation
Self-Verification -- Draft/refine verification passes with learned verifier scoring for higher-confidence answers
Tool Calling -- MCP (Model Context Protocol) client for tool discovery and invocation during generation
Tool Policy -- Confidence-aware tool routing policy for deciding when external tools should be invoked
Web Search -- Web search integration for grounded generation with citations
Image Routing -- Route image-related queries to Fuuga.Image for generation or captioning
A2A Protocol -- Agent-to-Agent protocol client for multi-agent communication
Tree-Structured Speculative Decoding -- Speculative decoding with tree-structured candidates for faster generation
Draft-and-Refine -- Multi-pass reasoning pipeline for improved output quality
Autonomous Agent -- Web search agent loop for autonomous information gathering
Advanced Reasoning -- Consensus voting, verifier-scored selection, and tree-of-thoughts for improved answer quality
Backend Client -- HTTP client for calling external OpenAI-compatible LLM endpoints with structured response types
Context Awareness -- Convention file discovery (AGENTS.md, CLAUDE.md, .cursorrules), language/framework detection, git context
Semantic Knowledge -- WordNet WNDB parser with token-to-synset mapping and multi-lingual support

Orchestration:

Multi-Model Orchestrator -- Route tasks to appropriate models based on capability
Cost-Aware Routing -- Budget-tracked model routing with cost optimization
Fan-Out Orchestration -- Decompose tasks into subtasks, run in parallel, and aggregate results
Resumable Orchestration -- Checkpoint and resume fan-out plans across sessions

Agentic Persistence:

Experience Store -- Append-only JSON Lines log of attempt outcomes with thread-safe managed access
Strategy Lessons -- Persist and load distilled lessons from past experience for self-improvement
Persistent Retrieval Store -- Disk-backed IRetrievalStore for cross-session document retrieval
Agent Session Management -- Session lifecycle (init → active → completed/failed), save/load state, step-level experience recording
Orchestration Checkpoints -- Save and resume fan-out orchestration plans with per-subtask completion tracking
Cost Outcome Tracking -- Persist cost-aware routing outcomes for budget optimization across sessions

Model Compression:

Structured Pruning -- Attention head removal and layer removal with importance scoring
NF4 Quantization -- 4-bit NormalFloat quantization for weight compression
Quantization-Aware Training (STE) -- Straight-Through Estimator for NF4 weights: forward pass sees quantized values, backward pass flows gradients through identity (--ste)
Compression Pipeline -- Orchestrated prune → fine-tune → quantize workflow for production deployment

Server:

OpenAI-Compatible API -- Separate fuuga-serve project with /v1/chat/completions, /v1/completions, /v1/models, and /v1/embeddings endpoints
SSE Streaming -- Server-Sent Events for real-time token streaming
Continuous Batching -- Iteration-level continuous batching for high-throughput serving
Dynamic Batching -- HTTP-integrated dynamic batch scheduling with SSE support
Bearer Token Auth -- Optional API key authentication middleware
Guard Rails -- Prompt injection detection, PII masking, and content filtering middleware
MCP Tool Routing -- Server-side MCP tool integration for function calling
A2A Server -- Agent-to-Agent protocol server endpoint for multi-agent workflows

Distributed Training:

PyTorch/DeepSpeed Integration -- Export model weights for distributed training, import trained weights back, and auto-generate launch scripts

Image Generation (Fuuga.Image):

Text-to-Image -- Stable Diffusion image generation from text prompts with configurable samplers, steps, and guidance
Image-to-Image -- Transform existing images guided by text prompts with denoising strength control
Image Captioning -- Phi-3.5-vision captioning with brief/standard/detailed output modes
MCP Server Mode -- Run as an MCP tool server over stdio for integration with Fuuga LLM

Observability:

OpenTelemetry -- OTLP trace and metrics export with Serilog integration
Spectre.Console -- Rich terminal output for training progress and diagnostics

Prerequisites

.NET 10 SDK (v10.0.103 or later)
GPU is optional -- CPU works for the dev configuration (small model). CUDA-capable GPU recommended for larger models.
~500 MB disk space for dependencies, plus space for training data and checkpoints

Quick Start

See examples/complete-pipeline.fsx for the full runnable pipeline, or use the CLI:

# Build (CPU):
dotnet build
# Build (GPU, ~2 GB dependency):
dotnet build -p:TorchBackend=cuda

# Train a tokenizer, ingest a corpus, train, and generate text
dotnet run -- tokenize --input data/raw --vocab-size 8000 --output data/tokenizer
dotnet run -- ingest --input data/raw --output data/corpus.bin --tokenizer data/tokenizer
dotnet run -- train --corpus data/corpus.bin --tokenizer data/tokenizer --checkpoint-dir checkpoints/
dotnet run -- infer --checkpoint checkpoints/step-100 --tokenizer data/tokenizer --prompt "Once upon a time"

See the Getting Started Tutorial for a complete end-to-end walkthrough.

F# Script Examples

Prefer the F# API over the CLI? Run the complete pipeline as an F# script:

dotnet build
dotnet fsi examples/complete-pipeline.fsx

This trains a tokenizer, ingests data, trains a model, and generates text -- all using the Fuuga modules directly. See examples/complete-pipeline.fsx for the full source.

For advanced workflows -- weight transfer from Phi-3/LLaMA3/DeepSeek, LoRA and prompt tuning, benchmark evaluation, ONNX export, verifier-assisted inference, constrained decoding, chain-of-thought, streaming inference, and serving via the OpenAI-compatible API -- see examples/advanced-scenarios.fsx.

For image generation and captioning, see examples/image-demo.fsx.

Image Generation

Fuuga.Image is a standalone CLI for image generation and captioning. See the Fuuga.Image README for full command reference and MCP server mode, or run examples/image-demo.fsx.

Project Structure

Fuuga.fsproj                    # Project file with layered compilation order
Types.fs                        # All shared types (ModelConfig, TrainingConfig, GenerationConfig, etc.)
Logging.fs                      # ActivitySource/Meter definitions, ILoggerFactory
Observability.fs                # OpenTelemetry providers, Spectre.Console, --observe flag
DriftDetection.fs               # Statistical drift monitoring (KS, PSI) over confidence signals
DriftAlerting.fs                # Dual-threshold alerts, adaptive thresholds, OTel metrics, retraining triggers
Config.fs                       # JSON config loading, CLI arg parsing, MCP config, LoRA target parsing
Validation.fs                   # Input validation pipeline with composable validators
Scaling.fs                      # Scaling heuristics from corpus and hardware stats
ConfigWizard.fs                 # Corpus analysis + hardware-aware config generation (Chinchilla scaling)
Tokenizer.fs                    # BPE tokenizer training and loading
ParquetIO.fs                    # HuggingFace Parquet dataset read/write (Document, SFT, DPO)
TextCleanup.fs                  # Ingestion/preparation text cleanup
RagCleanup.fs                   # RAG (Retrieval-Augmented Generation) cleanup algorithms
Ingest.fs                       # Document discovery and binary corpus writing
CorpusCompression.fs            # Zstd compression/decompression for .fuge files
Tensor.fs                       # Device selection (CPU/CUDA), DisposeScope
MultiResolutionAttention.fs     # Chunk pooling, global tokens for long context
Model.fs                        # GPT-2 transformer with RoPE, GQA, RMSNorm, SwiGLU, MLA
AttentionConfig.fs              # FlashAttention verification, SDPA backend selection
AutoConfig.fs                   # Auto-resolution of DU Auto* config cases from hardware probing
Vision.fs                       # Vision encoder for multimodal inputs
VisionBridge.fs                 # Q-Former cross-attention bridge for vision-to-language compression
PagedAttention.fs               # Paged KV-cache attention
MemoryHierarchy.fs              # Compressed memory, external retrieval
PersistentRetrievalStore.fs     # Disk-backed IRetrievalStore for cross-session retrieval
ConfidenceHead.fs               # Calibrated confidence MLP, Platt scaling, bucket assignment
EarlyExit.fs                    # Early exit / adaptive depth inference
Optimizer.fs                    # AdamW, SWA, Lookahead, and INT8 moment-quantized optimizers
Checkpoint.fs                   # Checkpoint save/load/metadata, safetensors
MmapLoading.fs                  # Memory-mapped model loading
GradientCheckpointing.fs        # Gradient checkpointing for memory-efficient training
DistributedTraining.fs          # Distributed training (PyTorch/DeepSpeed export/import)
ModelParallelism.fs             # Tensor/pipeline parallelism config for 70B+ models
GpuOffloading.fs                # Layer-wise CPU/GPU offloading
OptimizerOffload.fs             # Optimizer state offloading
NvmePaging.fs                   # ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management
OnnxExport.fs                   # ONNX export with quantization and validation
DataMixture.fs                  # N-ary weighted data source mixing
VramGuard.fs                    # In-process VRAM monitoring with nvidia-smi polling, signal-based training loop integration
Training.fs                     # Training loop with AdamW/cosine LR, gradient offloading, per-param flush
FineTuningData.fs               # SFT/DPO JSONL parsing, chat templates, tokenization, batching
FineTuning.fs                   # LoRA (LoraLinear), SFT training, DPO loss/training, adapter save/load
RewardFunctions.fs              # Composable reward functions for RL training
DataValidation.fs               # SFT/DPO JSONL validation, honesty pattern classification
Fp8Dequantization.fs            # FP8 format dequantization with GPU LUT acceleration
Nf4Quantizer.fs                 # NF4/FP4 4-bit weight quantization, STE for QAT
QLoraTraining.fs                # QLoRA training (NF4 base + LoRA adapters)
WeightTransfer.fs               # Weight transfer from donor models (Phi-3, LLaMA3, DeepSeek mappings)
ModelMerge.fs                   # N-ary model merging (TIES, DARE, Karcher) with EWC protection
AttentionDistillation.fs        # MLA-to-GQA attention distillation (KL + MSE loss)
Pruning.fs                      # Structured pruning (attention heads, layers)
CompressionPipeline.fs          # Prune → finetune → quantize orchestration
ConstrainedDecoding.fs          # Grammar-guided JSON constrained decoding
ChainOfThought.fs               # ThinkStart/ThinkEnd token handling, phase tracking
Verifier.fs                     # Rule-based and learned verifier strategies
ToolPolicy.fs                   # Confidence-aware tool invocation policy
McpClient.fs                    # MCP client: connection, tool discovery, tool invocation
WebSearch.fs                    # Web search integration for grounded generation
ImageRouting.fs                 # Image query routing to Fuuga.Image
A2AClient.fs                    # A2A protocol client for agent-to-agent communication
TreeSpeculation.fs              # Tree-structured speculative decoding
ContinuousBatching.fs           # Iteration-level continuous batching for serving
Inference.fs                    # Text generation with sampling, tool-augmented generation, structured output, CoT
DraftAndRefine.fs               # Draft-and-refine multi-pass reasoning pipeline
AdvancedReasoning.fs            # Consensus voting, verifier-scored, tree-of-thoughts
Orchestrator.fs                 # Multi-model orchestrator with capability-based routing
BackendClient.fs                # HTTP client for external OpenAI-compatible LLM endpoints
ExperienceStore.fs              # Append-only experience log, strategy lessons, managed store
CostAwareRouting.fs             # Cost-aware routing with budget tracking
AgentSession.fs                 # Agent session lifecycle, save/load, orchestration checkpoints
FanOutOrchestration.fs          # Fan-out/fan-in task decomposition and aggregation
ContextAwareness.fs             # Convention file discovery, language/framework detection, git context
SemanticKnowledge.fs            # WordNet WNDB parser, token-to-synset mapping, multi-lingual
DataAugmentation.fs             # Synonym replacement, paraphrasing, token noise, oversampling
VerifierRuntime.fs              # Verifier loading and runtime integration
Eval.fs                         # Benchmark datasets, runners, result serialization
Program.fs                      # CLI entry point with subcommand routing

Fuuga.Server/                   # OpenAI-compatible HTTP server (separate project)
  ApiTypes.fs                   # Request/response types (OpenAI-compatible)
  McpToolRouting.fs             # Server-side MCP tool routing for function calling
  FuugaChatClient.fs            # IChatClient adapter for Microsoft AI ecosystem
  FuugaEmbeddingGenerator.fs   # IEmbeddingGenerator adapter for /v1/embeddings
  A2AServer.fs                  # A2A protocol server endpoint
  GuardRails.fs                 # Prompt injection detection, PII masking, content filtering
  DynamicBatchingServer.fs      # Dynamic batching with HTTP/SSE integration
  Server.fs                     # Oxpecker HTTP server with SSE streaming, auth middleware
  Program.fs                    # Server entry point

Fuuga.Image/                    # Standalone image generation and captioning CLI
  Types.fs                      # Domain types, error handling (ImageError DU)
  Config.fs                     # CLI argument parsing
  ImageIO.fs                    # Image load/save, format conversion, validation
  Diffusion.fs                  # Stable Diffusion model wrapper (txt2img, img2img)
  Caption.fs                    # Phi-3.5-vision captioning (ONNX Runtime GenAI)
  McpServer.fs                  # MCP JSON-RPC 2.0 server over stdio
  Program.fs                    # Entry point, subcommand routing

Fuuga.Tests/                    # Unit and integration tests (xUnit + FsUnit)
Fuuga.Image/Fuuga.Image.Tests/  # Image module tests
docs/                           # User-facing documentation
examples/                       # Runnable F# script examples
scripts/                        # Training data generation and validation scripts

Documentation

Getting Started Tutorial -- End-to-end walkthrough from build to text generation
CLI Reference -- All subcommands, flags, defaults, and exit codes
Configuration Reference -- Model, training, generation, and fine-tuning parameters explained
Ecosystem Comparison -- Fuuga vs .NET ecosystem comparison
Language Centralization -- Adding new natural languages via the KnownNaturalLanguages registry
Fuuga.Image README -- Image generation CLI reference and MCP server mode
Complete Pipeline Example -- F# script demonstrating the full API
Advanced Scenarios Example -- Weight transfer, prompt/LoRA tuning, RL, evaluation, ONNX export, serving
Image Demo Example -- F# script demonstrating image generation and captioning

Architecture

Fuuga uses a layered module architecture with strict dependency ordering enforced by F#'s compilation model:

Layer 0:  Types, Logging, Observability           (foundation, no dependencies)
          DriftDetection, DriftAlerting            (statistical drift monitoring, OTel alerts)
Layer 1:  Config, Validation, Scaling             (configuration, validation, heuristics)
          ConfigWizard                             (hardware-aware config generation)
Layer 2:  Tokenizer, ParquetIO                    (BPE training/loading, Parquet dataset I/O)
Layer 3:  Ingest, CorpusCompression               (document discovery, corpus writing, Zstd compression)
Layer 4:  Tensor, MultiResolutionAttention        (device selection, chunk pooling + global tokens)
          Model, AttentionConfig, AutoConfig       (transformer with MLA, FlashAttention, auto-resolution)
          Vision, VisionBridge                     (vision encoder, Q-Former bridge)
          PagedAttention, MemoryHierarchy          (paged KV-cache, compressed memory)
          PersistentRetrievalStore                 (disk-backed retrieval for cross-session use)
          ConfidenceHead, EarlyExit                (calibration, adaptive depth)
Layer 5:  Optimizer, Checkpoint, MmapLoading       (optimizer variants incl. INT8 moments, save/load/metadata, mmap loading)
          GradientCheckpointing                    (activation recomputation)
          DistributedTraining, ModelParallelism    (PyTorch/DeepSpeed, tensor/pipeline parallel)
          GpuOffloading, OptimizerOffload          (CPU/GPU memory management)
          NvmePaging, OnnxExport                   (NVMe paging, ONNX export with quantization)
          DataMixture, VramGuard                     (data mixing, in-process VRAM monitoring)
Layer 6:  Training, FineTuningData, FineTuning    (training loop, SFT/DPO data, LoRA training)
          RewardFunctions, DataValidation          (composable RL rewards, JSONL validation)
          Fp8Dequantization, Nf4Quantizer          (FP8/NF4 quantization support, GPU LUT, STE for QAT)
          QLoraTraining                            (QLoRA: NF4 base + LoRA adapters)
          WeightTransfer, ModelMerge               (donor model transfer, N-ary merging)
          AttentionDistillation                    (MLA-to-GQA distillation)
          Pruning, CompressionPipeline             (structured pruning, prune→finetune→quantize)
Layer 7:  ConstrainedDecoding, ChainOfThought     (generation extensions)
          Verifier, ToolPolicy, McpClient          (verifier strategies, tool policy, tool calling)
          WebSearch, ImageRouting, A2AClient        (search, image routing, A2A protocol)
          TreeSpeculation, ContinuousBatching      (speculation, batch scheduling)
          Inference                                (text generation with sampling, tools, structured output)
          DraftAndRefine, AdvancedReasoning        (multi-pass reasoning, consensus/tree-of-thoughts)
          Orchestrator, BackendClient              (multi-model routing, external LLM client)
          ExperienceStore, CostAwareRouting        (cross-session persistence, cost optimization)
          AgentSession, FanOutOrchestration        (agent lifecycle, parallel task decomposition)
Layer 8:  ContextAwareness, SemanticKnowledge     (project context, WordNet)
          DataAugmentation                         (synonym replacement, paraphrasing, token noise)
          VerifierRuntime                          (verifier loading and runtime integration)
          Eval                                     (benchmark datasets, runners, result serialization)
Layer 9:  Program                                  (CLI entry point, subcommand routing)

More about design decisions can be read from the architecture document.

Running Tests

dotnet test Fuuga.Tests

Tests cover tokenization, ingestion, Parquet I/O, corpus compression, model architecture, MLA attention, vision encoder, vision bridge, paged attention, multi-resolution attention, attention configuration, early exit, training, optimizer variants, INT8 optimizer moments, gradient checkpointing, GPU offloading, optimizer offloading, ONNX export, inference, verifier-guided generation, speculative decoding, checkpoints, memory-mapped loading, fine-tuning, prompt tuning, QLoRA, reinforcement learning, reward functions, data validation, data augmentation, benchmark evaluation, FP8 dequantization, FP8 GPU LUT dequantization, NF4 quantization, STE quantization-aware training, structured pruning, compression pipeline, constrained decoding, chain-of-thought, MCP client, A2A client/server, web search, image routing, draft-and-refine, advanced reasoning, orchestrator, backend client, cost-aware routing, fan-out orchestration, weight transfer, attention distillation, model merging, distributed training, model parallelism, scaling, validation, continuous batching, dynamic batching server, experience persistence, agent session management, persistent retrieval, drift detection, drift alerting, context awareness, semantic knowledge, observability, guard rails, memory strategy (gradient offloading, per-param flush, VRAM guard lifecycle, ConfigWizard recommendations, multi-GPU parallelism), and CLI integration. Tests use xUnit with FsUnit assertions and include both unit tests and end-to-end integration tests.

Technology Stack

Component	Library	Purpose
Tensors & GPU	TorchSharp 0.106.0	Tensor operations, CUDA support
LibTorch	libtorch-cpu 2.10.0	LibTorch CPU backend
Tokenization	Microsoft.ML.Tokenizers 2.0.0	BPE tokenizer training
ONNX Runtime	Microsoft.ML.OnnxRuntime 1.24.3	ONNX model inference
ONNX Export	OnnxSharp 0.3.2	ONNX model construction and manipulation
Protobuf	Google.Protobuf 3.34.0	Protobuf serialization for ONNX
Epub parsing	VersOne.Epub 3.3.4	Extract text from epub files
Markdown	Markdig 1.1.1	Parse markdown to plain text
Parquet	Parquet.Net 5.5.0	HuggingFace-compatible dataset I/O
Compression	ZstdSharp.Port 0.8.7	Zstandard corpus compression
Statistics	MathNet.Numerics.FSharp 5.0.0	Drift detection (KS test, PSI)
Logging	Serilog 4.3.0	Structured logging
Logging sinks	Serilog.Sinks.Console 6.0.0, .File 6.0.0, .OpenTelemetry 4.2.0	Console, file, and OTel log sinks
Logging bridge	Serilog.Extensions.Logging 9.0.0	Serilog/Microsoft.Extensions.Logging bridge
JSON	FSharp.SystemTextJson 1.4.36	F# DU-aware serialization
Telemetry	OpenTelemetry 1.11.2	Distributed tracing and metrics
Telemetry export	OpenTelemetry.Exporter.OpenTelemetryProtocol 1.11.2	OTLP protocol export
Telemetry hosting	OpenTelemetry.Extensions.Hosting 1.11.2	OpenTelemetry hosting integration
Terminal UI	Spectre.Console 0.49.1	Rich terminal output
HuggingFace	TorchSharp.PyBridge 1.4.3	HuggingFace weight loading
SIMD	System.Numerics.Tensors 10.0.5	Preprocessing acceleration
AI abstractions	Microsoft.Extensions.AI 10.4.0	IChatClient adapter
AI evaluation	Microsoft.Extensions.AI.Evaluation 10.4.0	Benchmark evaluation framework
MCP	ModelContextProtocol 1.1.0	MCP client SDK for tool calling
A2A	A2A 0.3.3-preview	Agent-to-Agent protocol client
Testing	xUnit 2.9.3 + FsUnit.xUnit 6.0.1	Unit and integration tests
Server: HTTP	Oxpecker 2.0.0	F# HTTP server framework
Server: A2A	A2A.AspNetCore 0.3.3-preview	A2A protocol server
Image: Diffusion	StableDiffusion.NET 5.0.0	Stable Diffusion model wrapper
Image: Captioning	Microsoft.ML.OnnxRuntimeGenAI 0.12.1	Phi-3.5-vision captioning
Image: Processing	HPPH.SkiaSharp 1.0.0	Image load/save, format conversion

License

See LICENSE for details.

Product	Compatible and additional computed target framework versions.
.NET	net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- A2A (>= 0.3.3-preview)
- FSharp.Core (>= 10.1.201)
- FSharp.SystemTextJson (>= 1.4.36)
- Google.Protobuf (>= 3.34.1)
- libtorch-cpu (>= 2.10.0)
- Markdig (>= 1.1.1)
- MathNet.Numerics.FSharp (>= 5.0.0)
- Microsoft.Extensions.AI (>= 10.4.1)
- Microsoft.Extensions.AI.Evaluation (>= 10.4.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.4.0)
- Microsoft.Extensions.AI.Evaluation.Reporting (>= 10.4.0)
- Microsoft.ML.OnnxRuntime (>= 1.24.4)
- Microsoft.ML.Tokenizers (>= 2.0.0)
- ModelContextProtocol (>= 1.1.0)
- OnnxSharp (>= 0.3.2)
- OpenTelemetry (>= 1.15.0)
- OpenTelemetry.Exporter.OpenTelemetryProtocol (>= 1.15.0)
- OpenTelemetry.Extensions.Hosting (>= 1.15.0)
- Parquet.Net (>= 5.5.0)
- Serilog (>= 4.3.1)
- Serilog.Extensions.Logging (>= 10.0.0)
- Serilog.Sinks.Console (>= 6.1.1)
- Serilog.Sinks.File (>= 7.0.0)
- Serilog.Sinks.OpenTelemetry (>= 4.2.0)
- Spectre.Console (>= 0.54.0)
- System.Numerics.Tensors (>= 10.0.5)
- TorchSharp (>= 0.106.0)
- TorchSharp.PyBridge (>= 1.4.3)
- VersOne.Epub (>= 3.3.6)
- ZstdSharp.Port (>= 0.8.7)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.1	36	3/25/2026
1.0.0	33	3/24/2026