Fuuga 1.0.0

There is a newer version of this package available.
See the version list below for details.
dotnet add package Fuuga --version 1.0.0
                    
NuGet\Install-Package Fuuga -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Fuuga" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Fuuga" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="Fuuga" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Fuuga --version 1.0.0
                    
#r "nuget: Fuuga, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Fuuga@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Fuuga&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=Fuuga&version=1.0.0
                    
Install as a Cake Tool

Fuuga

Tired of paying tokens? Think you could train a better model? Well, now you can try.

An LLM built from scratch in F# and .NET. Fuuga implements a complete language model pipeline: tokenization, data ingestion, model training, fine-tuning, and text generation -- with no Python dependencies.

Built on TorchSharp for tensor operations and Microsoft.ML.Tokenizers for BPE, Fuuga uses idiomatic F# (discriminated unions, pipelines, immutability) throughout. Works on GPU or CPU.

Features

Core Pipeline:

  • BPE Tokenizer -- Train a byte-pair encoding tokenizer on your own corpus with configurable vocabulary size
  • Data Ingestion -- Discover and tokenize epub, markdown, Parquet, and plain text files into a binary corpus
  • Parquet I/O -- Read and write HuggingFace-compatible Parquet datasets for SFT, DPO, and document data
  • Corpus Compression -- Zstd compression/decompression for .fuge corpus files
  • GPT-2 Transformer -- Decoder-only causal transformer with rotary position embeddings (RoPE), grouped-query attention (GQA), RMSNorm, and SwiGLU activation
  • Multi-Head Latent Attention (MLA) -- DeepSeek-V2 style compressed KV cache with query/KV compression, decoupled RoPE keys, and optional weight absorption for reduced memory during inference
  • Vision Encoder -- Vision model support for multimodal inputs
  • Vision Bridge -- Q-Former cross-attention bridge that compresses vision patch tokens into learned query vectors for multimodal (image+text) inputs
  • Paged Attention -- Paged KV-cache attention for efficient memory usage during long-context generation
  • Memory Hierarchy -- Compressed memory with external retrieval for extended context
  • Multi-Resolution Attention -- Chunk pooling with global tokens for efficient long-context processing
  • FlashAttention Config -- SDPA backend selection and benchmarking for attention kernels
  • Auto Config -- Hardware-aware auto-resolution of DU configuration cases (norm, activation, precision, offloading, communication) at startup
  • Early Exit -- Adaptive depth inference for faster generation when confidence is high
  • Training -- AdamW optimizer with cosine learning rate scheduling, warmup, gradient clipping, mixed precision support, and gradient accumulation
  • INT8 Optimizer Moments -- Optional INT8 quantization of AdamW M/V moment tensors with per-row symmetric quantization, reducing optimizer memory ~4× (--moment-quant int8)
  • Optimizer Variants -- Stochastic Weight Averaging (SWA) and Lookahead optimizer support with checkpointable optimizer state
  • Gradient Checkpointing -- Memory-efficient training via activation recomputation
  • GPU Offloading -- Layer-wise CPU/GPU offloading for reduced VRAM usage
  • Optimizer Offloading -- Offload optimizer states to CPU memory
  • NVMe Paging -- ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management for training models larger than available VRAM
  • Per-Tensor Gradient Offloading -- Bulk-copy gradients to CPU after backward pass and restore before optimizer step, freeing GPU VRAM during the optimizer phase (--grad-offload)
  • Per-Parameter CUDA Flush -- Aggressive CUDA cache cleanup after optimizer step to reclaim transient VRAM spikes from M/V update temporaries (--flush-each-param)
  • VRAM Guard -- In-process background thread that polls GPU memory via nvidia-smi and signals the training loop to warn, skip batches, or abort when usage exceeds a configurable threshold (--vram-guard-gb <float>)
  • Memory Strategy Presets -- MemoryStrategyConfigs.none, .constrained (grad-offload + flush), and .full (all three with VRAM guard at 95%) with automatic ConfigWizard recommendations based on model-to-VRAM ratio
  • Model Parallelism -- Tensor and pipeline parallelism configuration for 70B+ parameter models, with automatic DataParallel recommendation for multi-GPU setups
  • Inference -- Greedy, top-k, top-p (nucleus), and temperature sampling with repetition penalty
  • Fill-in-the-Middle -- FIM support with prefix/suffix tokens for code completion
  • Checkpoints -- Save, load, resume training from checkpoints with full metadata; safetensors format support
  • Memory-Mapped Loading -- mmap-based model loading for fast startup
  • Streaming Inference -- Token-by-token generation with configurable stop conditions
  • Confidence Signals -- Entropy, repetition rate, hedging detection, calibrated confidence with Platt scaling, and stop reason reporting
  • Drift Detection -- Statistical drift monitoring (Kolmogorov-Smirnov, Population Stability Index) over confidence signals with ring-buffered accumulation
  • Drift Alerting -- Dual-threshold alerts with adaptive sigma-based thresholds, OpenTelemetry metrics, and retraining triggers
  • ONNX Export -- Export to ONNX format with fp16/int8 quantization, validation, and benchmarking
  • ONNX Inference -- ONNX Runtime backend for optimized inference (--backend onnx)
  • Benchmark Evaluation -- Built-in benchmark runner for MMLU, HellaSwag, ARC-Challenge, cached dataset downloads, and checkpoint-attached benchmark results
  • FP8 Dequantization -- FP8 format support for quantized weight loading with GPU-accelerated LUT path (256-entry cached lookup table using torch.index_select) auto-selected when CUDA is available
  • Validation Pipeline -- Input validation framework with composable validators
  • Scaling Heuristics -- Auto-scaling configuration from corpus and hardware stats
  • Config Wizard -- Corpus analysis and hardware-aware config generation using Chinchilla scaling laws, activation memory estimates, NTK-aware RoPE, multi-GPU detection (nvidia-smi), and memory strategy recommendations
  • CLI -- Subcommands for the full pipeline (tokenize, ingest, train, infer, info, sft, dpo, rl, merge, transfer, distill, merge-models, fisher, distributed, export onnx, compress, decompress, prune, eval, config, wordnet, serve, orchestrate, agent)

Fine-Tuning:

  • Supervised Fine-Tuning (SFT) -- LoRA-based fine-tuning on instruction/chat JSONL data with configurable rank, alpha, and target modules
  • Prompt Tuning -- Soft-prompt / virtual-token fine-tuning with frozen base weights for lightweight PEFT workflows
  • Direct Preference Optimization (DPO) -- Preference learning from chosen/rejected pairs with LoRA
  • Reinforcement Learning (RL) -- REINFORCE++ / GRPO fine-tuning with pluggable reward functions
  • Reward Functions -- Composable reward functions for RL training (correctness, formatting, safety)
  • LoRA Adapter Merging -- Merge trained LoRA adapters back into the base model weights
  • QLoRA -- NF4-quantized base weights with LoRA adapters for memory-efficient fine-tuning on consumer GPUs
  • Data Validation -- JSONL format validation for SFT and DPO datasets with honesty pattern classification
  • Data Augmentation -- Synonym replacement, rule-based paraphrasing, token-level noise injection, and SFT/DPO oversampling for training data diversity

Weight Transfer and Model Merging:

  • Weight Transfer -- Transfer weights from donor models with architecture-aware mapping (Phi-3, LLaMA3, DeepSeek-V3 dense FFN) and dimension adaptation for mismatched tensors
  • Knowledge Distillation -- Token-level, sequence-level, and reverse-KLD distillation from a teacher model
  • Attention Distillation -- MLA-to-GQA distillation that trains grouped-query attention layers to reproduce frozen Multi-head Latent Attention teacher outputs (DeepSeek-V3/Kimi K2) with KL-divergence + MSE loss
  • N-ary Model Merging -- Merge multiple models with configurable strategies (TIES, DARE, Karcher mean, ModelSoups, ModelStock) and EWC protection
  • Fisher Information -- Compute diagonal Fisher information matrices for Elastic Weight Consolidation
  • N-ary Data Mixing -- Weighted data source mixing for diverse training corpora

Inference Capabilities:

  • Chain-of-Thought -- Thinking mode with ThinkStart/ThinkEnd token handling and dimmed thinking display
  • Constrained Decoding -- Grammar-guided JSON structured output generation
  • Self-Verification -- Draft/refine verification passes with learned verifier scoring for higher-confidence answers
  • Tool Calling -- MCP (Model Context Protocol) client for tool discovery and invocation during generation
  • Tool Policy -- Confidence-aware tool routing policy for deciding when external tools should be invoked
  • Web Search -- Web search integration for grounded generation with citations
  • Image Routing -- Route image-related queries to Fuuga.Image for generation or captioning
  • A2A Protocol -- Agent-to-Agent protocol client for multi-agent communication
  • Tree-Structured Speculative Decoding -- Speculative decoding with tree-structured candidates for faster generation
  • Draft-and-Refine -- Multi-pass reasoning pipeline for improved output quality
  • Autonomous Agent -- Web search agent loop for autonomous information gathering
  • Advanced Reasoning -- Consensus voting, verifier-scored selection, and tree-of-thoughts for improved answer quality
  • Backend Client -- HTTP client for calling external OpenAI-compatible LLM endpoints with structured response types
  • Context Awareness -- Convention file discovery (AGENTS.md, CLAUDE.md, .cursorrules), language/framework detection, git context
  • Semantic Knowledge -- WordNet WNDB parser with token-to-synset mapping and multi-lingual support

Orchestration:

  • Multi-Model Orchestrator -- Route tasks to appropriate models based on capability
  • Cost-Aware Routing -- Budget-tracked model routing with cost optimization
  • Fan-Out Orchestration -- Decompose tasks into subtasks, run in parallel, and aggregate results
  • Resumable Orchestration -- Checkpoint and resume fan-out plans across sessions

Agentic Persistence:

  • Experience Store -- Append-only JSON Lines log of attempt outcomes with thread-safe managed access
  • Strategy Lessons -- Persist and load distilled lessons from past experience for self-improvement
  • Persistent Retrieval Store -- Disk-backed IRetrievalStore for cross-session document retrieval
  • Agent Session Management -- Session lifecycle (init → active → completed/failed), save/load state, step-level experience recording
  • Orchestration Checkpoints -- Save and resume fan-out orchestration plans with per-subtask completion tracking
  • Cost Outcome Tracking -- Persist cost-aware routing outcomes for budget optimization across sessions

Model Compression:

  • Structured Pruning -- Attention head removal and layer removal with importance scoring
  • NF4 Quantization -- 4-bit NormalFloat quantization for weight compression
  • Quantization-Aware Training (STE) -- Straight-Through Estimator for NF4 weights: forward pass sees quantized values, backward pass flows gradients through identity (--ste)
  • Compression Pipeline -- Orchestrated prune → fine-tune → quantize workflow for production deployment

Server:

  • OpenAI-Compatible API -- Separate fuuga-serve project with /v1/chat/completions, /v1/completions, /v1/models, and /v1/embeddings endpoints
  • SSE Streaming -- Server-Sent Events for real-time token streaming
  • Continuous Batching -- Iteration-level continuous batching for high-throughput serving
  • Dynamic Batching -- HTTP-integrated dynamic batch scheduling with SSE support
  • Bearer Token Auth -- Optional API key authentication middleware
  • Guard Rails -- Prompt injection detection, PII masking, and content filtering middleware
  • MCP Tool Routing -- Server-side MCP tool integration for function calling
  • A2A Server -- Agent-to-Agent protocol server endpoint for multi-agent workflows

Distributed Training:

  • PyTorch/DeepSpeed Integration -- Export model weights for distributed training, import trained weights back, and auto-generate launch scripts

Image Generation (Fuuga.Image):

  • Text-to-Image -- Stable Diffusion image generation from text prompts with configurable samplers, steps, and guidance
  • Image-to-Image -- Transform existing images guided by text prompts with denoising strength control
  • Image Captioning -- Phi-3.5-vision captioning with brief/standard/detailed output modes
  • MCP Server Mode -- Run as an MCP tool server over stdio for integration with Fuuga LLM

Observability:

  • OpenTelemetry -- OTLP trace and metrics export with Serilog integration
  • Spectre.Console -- Rich terminal output for training progress and diagnostics

Prerequisites

  • .NET 10 SDK (v10.0.103 or later)
  • GPU is optional -- CPU works for the dev configuration (small model). CUDA-capable GPU recommended for larger models.
  • ~500 MB disk space for dependencies, plus space for training data and checkpoints

Quick Start

See examples/complete-pipeline.fsx for the full runnable pipeline, or use the CLI:

# Build (CPU):
dotnet build
# Build (GPU, ~2 GB dependency):
dotnet build -p:TorchBackend=cuda

# Train a tokenizer, ingest a corpus, train, and generate text
dotnet run -- tokenize --input data/raw --vocab-size 8000 --output data/tokenizer
dotnet run -- ingest --input data/raw --output data/corpus.bin --tokenizer data/tokenizer
dotnet run -- train --corpus data/corpus.bin --tokenizer data/tokenizer --checkpoint-dir checkpoints/
dotnet run -- infer --checkpoint checkpoints/step-100 --tokenizer data/tokenizer --prompt "Once upon a time"

See the Getting Started Tutorial for a complete end-to-end walkthrough.

F# Script Examples

Prefer the F# API over the CLI? Run the complete pipeline as an F# script:

dotnet build
dotnet fsi examples/complete-pipeline.fsx

This trains a tokenizer, ingests data, trains a model, and generates text -- all using the Fuuga modules directly. See examples/complete-pipeline.fsx for the full source.

For advanced workflows -- weight transfer from Phi-3/LLaMA3/DeepSeek, LoRA and prompt tuning, benchmark evaluation, ONNX export, verifier-assisted inference, constrained decoding, chain-of-thought, streaming inference, and serving via the OpenAI-compatible API -- see examples/advanced-scenarios.fsx.

For image generation and captioning, see examples/image-demo.fsx.

Image Generation

Fuuga.Image is a standalone CLI for image generation and captioning. See the Fuuga.Image README for full command reference and MCP server mode, or run examples/image-demo.fsx.

Project Structure

Fuuga.fsproj                    # Project file with layered compilation order
Types.fs                        # All shared types (ModelConfig, TrainingConfig, GenerationConfig, etc.)
Logging.fs                      # ActivitySource/Meter definitions, ILoggerFactory
Observability.fs                # OpenTelemetry providers, Spectre.Console, --observe flag
DriftDetection.fs               # Statistical drift monitoring (KS, PSI) over confidence signals
DriftAlerting.fs                # Dual-threshold alerts, adaptive thresholds, OTel metrics, retraining triggers
Config.fs                       # JSON config loading, CLI arg parsing, MCP config, LoRA target parsing
Validation.fs                   # Input validation pipeline with composable validators
Scaling.fs                      # Scaling heuristics from corpus and hardware stats
ConfigWizard.fs                 # Corpus analysis + hardware-aware config generation (Chinchilla scaling)
Tokenizer.fs                    # BPE tokenizer training and loading
ParquetIO.fs                    # HuggingFace Parquet dataset read/write (Document, SFT, DPO)
Ingest.fs                       # Document discovery and binary corpus writing
CorpusCompression.fs            # Zstd compression/decompression for .fuge files
Tensor.fs                       # Device selection (CPU/CUDA), DisposeScope
MultiResolutionAttention.fs     # Chunk pooling, global tokens for long context
Model.fs                        # GPT-2 transformer with RoPE, GQA, RMSNorm, SwiGLU, MLA
AttentionConfig.fs              # FlashAttention verification, SDPA backend selection
AutoConfig.fs                   # Auto-resolution of DU Auto* config cases from hardware probing
Vision.fs                       # Vision encoder for multimodal inputs
VisionBridge.fs                 # Q-Former cross-attention bridge for vision-to-language compression
PagedAttention.fs               # Paged KV-cache attention
MemoryHierarchy.fs              # Compressed memory, external retrieval
PersistentRetrievalStore.fs     # Disk-backed IRetrievalStore for cross-session retrieval
ConfidenceHead.fs               # Calibrated confidence MLP, Platt scaling, bucket assignment
EarlyExit.fs                    # Early exit / adaptive depth inference
Optimizer.fs                    # AdamW, SWA, Lookahead, and INT8 moment-quantized optimizers
Checkpoint.fs                   # Checkpoint save/load/metadata, safetensors
MmapLoading.fs                  # Memory-mapped model loading
GradientCheckpointing.fs        # Gradient checkpointing for memory-efficient training
DistributedTraining.fs          # Distributed training (PyTorch/DeepSpeed export/import)
ModelParallelism.fs             # Tensor/pipeline parallelism config for 70B+ models
GpuOffloading.fs                # Layer-wise CPU/GPU offloading
OptimizerOffload.fs             # Optimizer state offloading
NvmePaging.fs                   # ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management
OnnxExport.fs                   # ONNX export with quantization and validation
DataMixture.fs                  # N-ary weighted data source mixing
VramGuard.fs                    # In-process VRAM monitoring with nvidia-smi polling, signal-based training loop integration
Training.fs                     # Training loop with AdamW/cosine LR, gradient offloading, per-param flush
FineTuningData.fs               # SFT/DPO JSONL parsing, chat templates, tokenization, batching
FineTuning.fs                   # LoRA (LoraLinear), SFT training, DPO loss/training, adapter save/load
RewardFunctions.fs              # Composable reward functions for RL training
DataValidation.fs               # SFT/DPO JSONL validation, honesty pattern classification
Fp8Dequantization.fs            # FP8 format dequantization with GPU LUT acceleration
Nf4Quantizer.fs                 # NF4/FP4 4-bit weight quantization, STE for QAT
QLoraTraining.fs                # QLoRA training (NF4 base + LoRA adapters)
WeightTransfer.fs               # Weight transfer from donor models (Phi-3, LLaMA3, DeepSeek mappings)
ModelMerge.fs                   # N-ary model merging (TIES, DARE, Karcher) with EWC protection
AttentionDistillation.fs        # MLA-to-GQA attention distillation (KL + MSE loss)
Pruning.fs                      # Structured pruning (attention heads, layers)
CompressionPipeline.fs          # Prune → finetune → quantize orchestration
ConstrainedDecoding.fs          # Grammar-guided JSON constrained decoding
ChainOfThought.fs               # ThinkStart/ThinkEnd token handling, phase tracking
Verifier.fs                     # Rule-based and learned verifier strategies
ToolPolicy.fs                   # Confidence-aware tool invocation policy
McpClient.fs                    # MCP client: connection, tool discovery, tool invocation
WebSearch.fs                    # Web search integration for grounded generation
ImageRouting.fs                 # Image query routing to Fuuga.Image
A2AClient.fs                    # A2A protocol client for agent-to-agent communication
TreeSpeculation.fs              # Tree-structured speculative decoding
ContinuousBatching.fs           # Iteration-level continuous batching for serving
Inference.fs                    # Text generation with sampling, tool-augmented generation, structured output, CoT
DraftAndRefine.fs               # Draft-and-refine multi-pass reasoning pipeline
AdvancedReasoning.fs            # Consensus voting, verifier-scored, tree-of-thoughts
Orchestrator.fs                 # Multi-model orchestrator with capability-based routing
BackendClient.fs                # HTTP client for external OpenAI-compatible LLM endpoints
ExperienceStore.fs              # Append-only experience log, strategy lessons, managed store
CostAwareRouting.fs             # Cost-aware routing with budget tracking
AgentSession.fs                 # Agent session lifecycle, save/load, orchestration checkpoints
FanOutOrchestration.fs          # Fan-out/fan-in task decomposition and aggregation
ContextAwareness.fs             # Convention file discovery, language/framework detection, git context
SemanticKnowledge.fs            # WordNet WNDB parser, token-to-synset mapping, multi-lingual
DataAugmentation.fs             # Synonym replacement, paraphrasing, token noise, oversampling
VerifierRuntime.fs              # Verifier loading and runtime integration
Eval.fs                         # Benchmark datasets, runners, result serialization
Program.fs                      # CLI entry point with subcommand routing

Fuuga.Server/                   # OpenAI-compatible HTTP server (separate project)
  ApiTypes.fs                   # Request/response types (OpenAI-compatible)
  McpToolRouting.fs             # Server-side MCP tool routing for function calling
  FuugaChatClient.fs            # IChatClient adapter for Microsoft AI ecosystem
  FuugaEmbeddingGenerator.fs   # IEmbeddingGenerator adapter for /v1/embeddings
  A2AServer.fs                  # A2A protocol server endpoint
  GuardRails.fs                 # Prompt injection detection, PII masking, content filtering
  DynamicBatchingServer.fs      # Dynamic batching with HTTP/SSE integration
  Server.fs                     # Oxpecker HTTP server with SSE streaming, auth middleware
  Program.fs                    # Server entry point

Fuuga.Image/                    # Standalone image generation and captioning CLI
  Types.fs                      # Domain types, error handling (ImageError DU)
  Config.fs                     # CLI argument parsing
  ImageIO.fs                    # Image load/save, format conversion, validation
  Diffusion.fs                  # Stable Diffusion model wrapper (txt2img, img2img)
  Caption.fs                    # Phi-3.5-vision captioning (ONNX Runtime GenAI)
  McpServer.fs                  # MCP JSON-RPC 2.0 server over stdio
  Program.fs                    # Entry point, subcommand routing

Fuuga.Tests/                    # Unit and integration tests (xUnit + FsUnit)
Fuuga.Image/Fuuga.Image.Tests/  # Image module tests
docs/                           # User-facing documentation
examples/                       # Runnable F# script examples
scripts/                        # Training data generation and validation scripts

Documentation

Architecture

Fuuga uses a layered module architecture with strict dependency ordering enforced by F#'s compilation model:

Layer 0:  Types, Logging, Observability           (foundation, no dependencies)
          DriftDetection, DriftAlerting            (statistical drift monitoring, OTel alerts)
Layer 1:  Config, Validation, Scaling             (configuration, validation, heuristics)
          ConfigWizard                             (hardware-aware config generation)
Layer 2:  Tokenizer, ParquetIO                    (BPE training/loading, Parquet dataset I/O)
Layer 3:  Ingest, CorpusCompression               (document discovery, corpus writing, Zstd compression)
Layer 4:  Tensor, MultiResolutionAttention        (device selection, chunk pooling + global tokens)
          Model, AttentionConfig, AutoConfig       (transformer with MLA, FlashAttention, auto-resolution)
          Vision, VisionBridge                     (vision encoder, Q-Former bridge)
          PagedAttention, MemoryHierarchy          (paged KV-cache, compressed memory)
          PersistentRetrievalStore                 (disk-backed retrieval for cross-session use)
          ConfidenceHead, EarlyExit                (calibration, adaptive depth)
Layer 5:  Optimizer, Checkpoint, MmapLoading       (optimizer variants incl. INT8 moments, save/load/metadata, mmap loading)
          GradientCheckpointing                    (activation recomputation)
          DistributedTraining, ModelParallelism    (PyTorch/DeepSpeed, tensor/pipeline parallel)
          GpuOffloading, OptimizerOffload          (CPU/GPU memory management)
          NvmePaging, OnnxExport                   (NVMe paging, ONNX export with quantization)
          DataMixture, VramGuard                     (data mixing, in-process VRAM monitoring)
Layer 6:  Training, FineTuningData, FineTuning    (training loop, SFT/DPO data, LoRA training)
          RewardFunctions, DataValidation          (composable RL rewards, JSONL validation)
          Fp8Dequantization, Nf4Quantizer          (FP8/NF4 quantization support, GPU LUT, STE for QAT)
          QLoraTraining                            (QLoRA: NF4 base + LoRA adapters)
          WeightTransfer, ModelMerge               (donor model transfer, N-ary merging)
          AttentionDistillation                    (MLA-to-GQA distillation)
          Pruning, CompressionPipeline             (structured pruning, prune→finetune→quantize)
Layer 7:  ConstrainedDecoding, ChainOfThought     (generation extensions)
          Verifier, ToolPolicy, McpClient          (verifier strategies, tool policy, tool calling)
          WebSearch, ImageRouting, A2AClient        (search, image routing, A2A protocol)
          TreeSpeculation, ContinuousBatching      (speculation, batch scheduling)
          Inference                                (text generation with sampling, tools, structured output)
          DraftAndRefine, AdvancedReasoning        (multi-pass reasoning, consensus/tree-of-thoughts)
          Orchestrator, BackendClient              (multi-model routing, external LLM client)
          ExperienceStore, CostAwareRouting        (cross-session persistence, cost optimization)
          AgentSession, FanOutOrchestration        (agent lifecycle, parallel task decomposition)
Layer 8:  ContextAwareness, SemanticKnowledge     (project context, WordNet)
          DataAugmentation                         (synonym replacement, paraphrasing, token noise)
          VerifierRuntime                          (verifier loading and runtime integration)
          Eval                                     (benchmark datasets, runners, result serialization)
Layer 9:  Program                                  (CLI entry point, subcommand routing)

More about design decisions can be read from the architecture document.

Running Tests

dotnet test Fuuga.Tests

Tests cover tokenization, ingestion, Parquet I/O, corpus compression, model architecture, MLA attention, vision encoder, vision bridge, paged attention, multi-resolution attention, attention configuration, early exit, training, optimizer variants, INT8 optimizer moments, gradient checkpointing, GPU offloading, optimizer offloading, ONNX export, inference, verifier-guided generation, speculative decoding, checkpoints, memory-mapped loading, fine-tuning, prompt tuning, QLoRA, reinforcement learning, reward functions, data validation, data augmentation, benchmark evaluation, FP8 dequantization, FP8 GPU LUT dequantization, NF4 quantization, STE quantization-aware training, structured pruning, compression pipeline, constrained decoding, chain-of-thought, MCP client, A2A client/server, web search, image routing, draft-and-refine, advanced reasoning, orchestrator, backend client, cost-aware routing, fan-out orchestration, weight transfer, attention distillation, model merging, distributed training, model parallelism, scaling, validation, continuous batching, dynamic batching server, experience persistence, agent session management, persistent retrieval, drift detection, drift alerting, context awareness, semantic knowledge, observability, guard rails, memory strategy (gradient offloading, per-param flush, VRAM guard lifecycle, ConfigWizard recommendations, multi-GPU parallelism), and CLI integration. Tests use xUnit with FsUnit assertions and include both unit tests and end-to-end integration tests.

Technology Stack

Component Library Purpose
Tensors & GPU TorchSharp 0.106.0 Tensor operations, CUDA support
LibTorch libtorch-cpu 2.10.0 LibTorch CPU backend
Tokenization Microsoft.ML.Tokenizers 2.0.0 BPE tokenizer training
ONNX Runtime Microsoft.ML.OnnxRuntime 1.24.3 ONNX model inference
ONNX Export OnnxSharp 0.3.2 ONNX model construction and manipulation
Protobuf Google.Protobuf 3.34.0 Protobuf serialization for ONNX
Epub parsing VersOne.Epub 3.3.4 Extract text from epub files
Markdown Markdig 1.1.1 Parse markdown to plain text
Parquet Parquet.Net 5.5.0 HuggingFace-compatible dataset I/O
Compression ZstdSharp.Port 0.8.7 Zstandard corpus compression
Statistics MathNet.Numerics.FSharp 5.0.0 Drift detection (KS test, PSI)
Logging Serilog 4.3.0 Structured logging
Logging sinks Serilog.Sinks.Console 6.0.0, .File 6.0.0, .OpenTelemetry 4.2.0 Console, file, and OTel log sinks
Logging bridge Serilog.Extensions.Logging 9.0.0 Serilog/Microsoft.Extensions.Logging bridge
JSON FSharp.SystemTextJson 1.4.36 F# DU-aware serialization
Telemetry OpenTelemetry 1.11.2 Distributed tracing and metrics
Telemetry export OpenTelemetry.Exporter.OpenTelemetryProtocol 1.11.2 OTLP protocol export
Telemetry hosting OpenTelemetry.Extensions.Hosting 1.11.2 OpenTelemetry hosting integration
Terminal UI Spectre.Console 0.49.1 Rich terminal output
HuggingFace TorchSharp.PyBridge 1.4.3 HuggingFace weight loading
SIMD System.Numerics.Tensors 10.0.5 Preprocessing acceleration
AI abstractions Microsoft.Extensions.AI 10.4.0 IChatClient adapter
AI evaluation Microsoft.Extensions.AI.Evaluation 10.4.0 Benchmark evaluation framework
MCP ModelContextProtocol 1.1.0 MCP client SDK for tool calling
A2A A2A 0.3.3-preview Agent-to-Agent protocol client
Testing xUnit 2.9.3 + FsUnit.xUnit 6.0.1 Unit and integration tests
Server: HTTP Oxpecker 2.0.0 F# HTTP server framework
Server: A2A A2A.AspNetCore 0.3.3-preview A2A protocol server
Image: Diffusion StableDiffusion.NET 5.0.0 Stable Diffusion model wrapper
Image: Captioning Microsoft.ML.OnnxRuntimeGenAI 0.12.1 Phi-3.5-vision captioning
Image: Processing HPPH.SkiaSharp 1.0.0 Image load/save, format conversion

License

See LICENSE for details.

Product Compatible and additional computed target framework versions.
.NET net10.0 is compatible.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.1 32 3/25/2026
1.0.0 32 3/24/2026