Fuuga.cpu
1.0.1
dotnet add package Fuuga.cpu --version 1.0.1
NuGet\Install-Package Fuuga.cpu -Version 1.0.1
<PackageReference Include="Fuuga.cpu" Version="1.0.1" />
<PackageVersion Include="Fuuga.cpu" Version="1.0.1" />
<PackageReference Include="Fuuga.cpu" />
paket add Fuuga.cpu --version 1.0.1
#r "nuget: Fuuga.cpu, 1.0.1"
#:package Fuuga.cpu@1.0.1
#addin nuget:?package=Fuuga.cpu&version=1.0.1
#tool nuget:?package=Fuuga.cpu&version=1.0.1
Fuuga
Tired of paying tokens? Think you could train a better model? Well, now you can try.
An LLM built from scratch in F# and .NET. Fuuga implements a complete language model pipeline: tokenization, data ingestion, model training, fine-tuning, and text generation -- with no Python dependencies.
Built on TorchSharp for tensor operations and Microsoft.ML.Tokenizers for BPE, Fuuga uses idiomatic F# (discriminated unions, pipelines, immutability) throughout. Works on GPU or CPU.
Features
Core Pipeline:
- BPE Tokenizer -- Train a byte-pair encoding tokenizer on your own corpus with configurable vocabulary size
- Data Ingestion -- Discover and tokenize epub, markdown, Parquet, and plain text files into a binary corpus
- Parquet I/O -- Read and write HuggingFace-compatible Parquet datasets for SFT, DPO, and document data
- Corpus Compression -- Zstd compression/decompression for
.fugecorpus files - GPT-2 Transformer -- Decoder-only causal transformer with rotary position embeddings (RoPE), grouped-query attention (GQA), RMSNorm, and SwiGLU activation
- Multi-Head Latent Attention (MLA) -- DeepSeek-V2 style compressed KV cache with query/KV compression, decoupled RoPE keys, and optional weight absorption for reduced memory during inference
- Vision Encoder -- Vision model support for multimodal inputs
- Vision Bridge -- Q-Former cross-attention bridge that compresses vision patch tokens into learned query vectors for multimodal (image+text) inputs
- Paged Attention -- Paged KV-cache attention for efficient memory usage during long-context generation
- Memory Hierarchy -- Compressed memory with external retrieval for extended context
- Multi-Resolution Attention -- Chunk pooling with global tokens for efficient long-context processing
- FlashAttention Config -- SDPA backend selection and benchmarking for attention kernels
- Auto Config -- Hardware-aware auto-resolution of DU configuration cases (norm, activation, precision, offloading, communication) at startup
- Early Exit -- Adaptive depth inference for faster generation when confidence is high
- Training -- AdamW optimizer with cosine learning rate scheduling, warmup, gradient clipping, mixed precision support, and gradient accumulation
- INT8 Optimizer Moments -- Optional INT8 quantization of AdamW M/V moment tensors with per-row symmetric quantization, reducing optimizer memory ~4× (
--moment-quant int8) - Optimizer Variants -- Stochastic Weight Averaging (SWA) and Lookahead optimizer support with checkpointable optimizer state
- Gradient Checkpointing -- Memory-efficient training via activation recomputation
- GPU Offloading -- Layer-wise CPU/GPU offloading for reduced VRAM usage
- Optimizer Offloading -- Offload optimizer states to CPU memory
- NVMe Paging -- ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management for training models larger than available VRAM
- Per-Tensor Gradient Offloading -- Bulk-copy gradients to CPU after backward pass and restore before optimizer step, freeing GPU VRAM during the optimizer phase (
--grad-offload) - Per-Parameter CUDA Flush -- Aggressive CUDA cache cleanup after optimizer step to reclaim transient VRAM spikes from M/V update temporaries (
--flush-each-param) - VRAM Guard -- In-process background thread that polls GPU memory via nvidia-smi and signals the training loop to warn, skip batches, or abort when usage exceeds a configurable threshold (
--vram-guard-gb <float>) - Memory Strategy Presets --
MemoryStrategyConfigs.none,.constrained(grad-offload + flush), and.full(all three with VRAM guard at 95%) with automatic ConfigWizard recommendations based on model-to-VRAM ratio - Model Parallelism -- Tensor and pipeline parallelism configuration for 70B+ parameter models, with automatic DataParallel recommendation for multi-GPU setups
- Inference -- Greedy, top-k, top-p (nucleus), and temperature sampling with repetition penalty
- Fill-in-the-Middle -- FIM support with prefix/suffix tokens for code completion
- Checkpoints -- Save, load, resume training from checkpoints with full metadata; safetensors format support
- Memory-Mapped Loading -- mmap-based model loading for fast startup
- Streaming Inference -- Token-by-token generation with configurable stop conditions
- Confidence Signals -- Entropy, repetition rate, hedging detection, calibrated confidence with Platt scaling, and stop reason reporting
- Drift Detection -- Statistical drift monitoring (Kolmogorov-Smirnov, Population Stability Index) over confidence signals with ring-buffered accumulation
- Drift Alerting -- Dual-threshold alerts with adaptive sigma-based thresholds, OpenTelemetry metrics, and retraining triggers
- ONNX Export -- Export to ONNX format with fp16/int8 quantization, validation, and benchmarking
- ONNX Inference -- ONNX Runtime backend for optimized inference (
--backend onnx) - Benchmark Evaluation -- Built-in benchmark runner for MMLU, HellaSwag, ARC-Challenge, cached dataset downloads, and checkpoint-attached benchmark results
- FP8 Dequantization -- FP8 format support for quantized weight loading with GPU-accelerated LUT path (256-entry cached lookup table using
torch.index_select) auto-selected when CUDA is available - Validation Pipeline -- Input validation framework with composable validators
- Scaling Heuristics -- Auto-scaling configuration from corpus and hardware stats
- Config Wizard -- Corpus analysis and hardware-aware config generation using Chinchilla scaling laws, activation memory estimates, NTK-aware RoPE, multi-GPU detection (nvidia-smi), and memory strategy recommendations
- CLI -- Subcommands for the full pipeline (
tokenize,ingest,train,infer,info,sft,dpo,rl,merge,transfer,distill,merge-models,fisher,distributed,export onnx,compress,decompress,prune,eval,config,wordnet,serve,orchestrate,agent)
Fine-Tuning:
- Supervised Fine-Tuning (SFT) -- LoRA-based fine-tuning on instruction/chat JSONL data with configurable rank, alpha, and target modules
- Prompt Tuning -- Soft-prompt / virtual-token fine-tuning with frozen base weights for lightweight PEFT workflows
- Direct Preference Optimization (DPO) -- Preference learning from chosen/rejected pairs with LoRA
- Reinforcement Learning (RL) -- REINFORCE++ / GRPO fine-tuning with pluggable reward functions
- Reward Functions -- Composable reward functions for RL training (correctness, formatting, safety)
- LoRA Adapter Merging -- Merge trained LoRA adapters back into the base model weights
- QLoRA -- NF4-quantized base weights with LoRA adapters for memory-efficient fine-tuning on consumer GPUs
- Data Validation -- JSONL format validation for SFT and DPO datasets with honesty pattern classification
- Data Augmentation -- Synonym replacement, rule-based paraphrasing, token-level noise injection, and SFT/DPO oversampling for training data diversity
Weight Transfer and Model Merging:
- Weight Transfer -- Transfer weights from donor models with architecture-aware mapping (Phi-3, LLaMA3, DeepSeek-V3 dense FFN) and dimension adaptation for mismatched tensors
- Knowledge Distillation -- Token-level, sequence-level, and reverse-KLD distillation from a teacher model
- Attention Distillation -- MLA-to-GQA distillation that trains grouped-query attention layers to reproduce frozen Multi-head Latent Attention teacher outputs (DeepSeek-V3/Kimi K2) with KL-divergence + MSE loss
- N-ary Model Merging -- Merge multiple models with configurable strategies (TIES, DARE, Karcher mean, ModelSoups, ModelStock) and EWC protection
- Fisher Information -- Compute diagonal Fisher information matrices for Elastic Weight Consolidation
- N-ary Data Mixing -- Weighted data source mixing for diverse training corpora
Inference Capabilities:
- Chain-of-Thought -- Thinking mode with ThinkStart/ThinkEnd token handling and dimmed thinking display
- Constrained Decoding -- Grammar-guided JSON structured output generation
- Self-Verification -- Draft/refine verification passes with learned verifier scoring for higher-confidence answers
- Tool Calling -- MCP (Model Context Protocol) client for tool discovery and invocation during generation
- Tool Policy -- Confidence-aware tool routing policy for deciding when external tools should be invoked
- Web Search -- Web search integration for grounded generation with citations
- Image Routing -- Route image-related queries to Fuuga.Image for generation or captioning
- A2A Protocol -- Agent-to-Agent protocol client for multi-agent communication
- Tree-Structured Speculative Decoding -- Speculative decoding with tree-structured candidates for faster generation
- Draft-and-Refine -- Multi-pass reasoning pipeline for improved output quality
- Autonomous Agent -- Web search agent loop for autonomous information gathering
- Advanced Reasoning -- Consensus voting, verifier-scored selection, and tree-of-thoughts for improved answer quality
- Backend Client -- HTTP client for calling external OpenAI-compatible LLM endpoints with structured response types
- Context Awareness -- Convention file discovery (AGENTS.md, CLAUDE.md, .cursorrules), language/framework detection, git context
- Semantic Knowledge -- WordNet WNDB parser with token-to-synset mapping and multi-lingual support
Orchestration:
- Multi-Model Orchestrator -- Route tasks to appropriate models based on capability
- Cost-Aware Routing -- Budget-tracked model routing with cost optimization
- Fan-Out Orchestration -- Decompose tasks into subtasks, run in parallel, and aggregate results
- Resumable Orchestration -- Checkpoint and resume fan-out plans across sessions
Agentic Persistence:
- Experience Store -- Append-only JSON Lines log of attempt outcomes with thread-safe managed access
- Strategy Lessons -- Persist and load distilled lessons from past experience for self-improvement
- Persistent Retrieval Store -- Disk-backed IRetrievalStore for cross-session document retrieval
- Agent Session Management -- Session lifecycle (init → active → completed/failed), save/load state, step-level experience recording
- Orchestration Checkpoints -- Save and resume fan-out orchestration plans with per-subtask completion tracking
- Cost Outcome Tracking -- Persist cost-aware routing outcomes for budget optimization across sessions
Model Compression:
- Structured Pruning -- Attention head removal and layer removal with importance scoring
- NF4 Quantization -- 4-bit NormalFloat quantization for weight compression
- Quantization-Aware Training (STE) -- Straight-Through Estimator for NF4 weights: forward pass sees quantized values, backward pass flows gradients through identity (
--ste) - Compression Pipeline -- Orchestrated prune → fine-tune → quantize workflow for production deployment
Server:
- OpenAI-Compatible API -- Separate
fuuga-serveproject with/v1/chat/completions,/v1/completions,/v1/models, and/v1/embeddingsendpoints - SSE Streaming -- Server-Sent Events for real-time token streaming
- Continuous Batching -- Iteration-level continuous batching for high-throughput serving
- Dynamic Batching -- HTTP-integrated dynamic batch scheduling with SSE support
- Bearer Token Auth -- Optional API key authentication middleware
- Guard Rails -- Prompt injection detection, PII masking, and content filtering middleware
- MCP Tool Routing -- Server-side MCP tool integration for function calling
- A2A Server -- Agent-to-Agent protocol server endpoint for multi-agent workflows
Distributed Training:
- PyTorch/DeepSpeed Integration -- Export model weights for distributed training, import trained weights back, and auto-generate launch scripts
Image Generation (Fuuga.Image):
- Text-to-Image -- Stable Diffusion image generation from text prompts with configurable samplers, steps, and guidance
- Image-to-Image -- Transform existing images guided by text prompts with denoising strength control
- Image Captioning -- Phi-3.5-vision captioning with brief/standard/detailed output modes
- MCP Server Mode -- Run as an MCP tool server over stdio for integration with Fuuga LLM
Observability:
- OpenTelemetry -- OTLP trace and metrics export with Serilog integration
- Spectre.Console -- Rich terminal output for training progress and diagnostics
Prerequisites
- .NET 10 SDK (v10.0.103 or later)
- GPU is optional -- CPU works for the dev configuration (small model). CUDA-capable GPU recommended for larger models.
- ~500 MB disk space for dependencies, plus space for training data and checkpoints
Quick Start
See examples/complete-pipeline.fsx for the full runnable pipeline, or use the CLI:
# Build (CPU):
dotnet build
# Build (GPU, ~2 GB dependency):
dotnet build -p:TorchBackend=cuda
# Train a tokenizer, ingest a corpus, train, and generate text
dotnet run -- tokenize --input data/raw --vocab-size 8000 --output data/tokenizer
dotnet run -- ingest --input data/raw --output data/corpus.bin --tokenizer data/tokenizer
dotnet run -- train --corpus data/corpus.bin --tokenizer data/tokenizer --checkpoint-dir checkpoints/
dotnet run -- infer --checkpoint checkpoints/step-100 --tokenizer data/tokenizer --prompt "Once upon a time"
See the Getting Started Tutorial for a complete end-to-end walkthrough.
F# Script Examples
Prefer the F# API over the CLI? Run the complete pipeline as an F# script:
dotnet build
dotnet fsi examples/complete-pipeline.fsx
This trains a tokenizer, ingests data, trains a model, and generates text -- all using the Fuuga modules directly. See examples/complete-pipeline.fsx for the full source.
For advanced workflows -- weight transfer from Phi-3/LLaMA3/DeepSeek, LoRA and prompt tuning, benchmark evaluation, ONNX export, verifier-assisted inference, constrained decoding, chain-of-thought, streaming inference, and serving via the OpenAI-compatible API -- see examples/advanced-scenarios.fsx.
For image generation and captioning, see examples/image-demo.fsx.
Image Generation
Fuuga.Image is a standalone CLI for image generation and captioning. See the Fuuga.Image README for full command reference and MCP server mode, or run examples/image-demo.fsx.
Project Structure
Fuuga.fsproj # Project file with layered compilation order
Types.fs # All shared types (ModelConfig, TrainingConfig, GenerationConfig, etc.)
Logging.fs # ActivitySource/Meter definitions, ILoggerFactory
Observability.fs # OpenTelemetry providers, Spectre.Console, --observe flag
DriftDetection.fs # Statistical drift monitoring (KS, PSI) over confidence signals
DriftAlerting.fs # Dual-threshold alerts, adaptive thresholds, OTel metrics, retraining triggers
Config.fs # JSON config loading, CLI arg parsing, MCP config, LoRA target parsing
Validation.fs # Input validation pipeline with composable validators
Scaling.fs # Scaling heuristics from corpus and hardware stats
ConfigWizard.fs # Corpus analysis + hardware-aware config generation (Chinchilla scaling)
Tokenizer.fs # BPE tokenizer training and loading
ParquetIO.fs # HuggingFace Parquet dataset read/write (Document, SFT, DPO)
TextCleanup.fs # Ingestion/preparation text cleanup
RagCleanup.fs # RAG (Retrieval-Augmented Generation) cleanup algorithms
Ingest.fs # Document discovery and binary corpus writing
CorpusCompression.fs # Zstd compression/decompression for .fuge files
Tensor.fs # Device selection (CPU/CUDA), DisposeScope
MultiResolutionAttention.fs # Chunk pooling, global tokens for long context
Model.fs # GPT-2 transformer with RoPE, GQA, RMSNorm, SwiGLU, MLA
AttentionConfig.fs # FlashAttention verification, SDPA backend selection
AutoConfig.fs # Auto-resolution of DU Auto* config cases from hardware probing
Vision.fs # Vision encoder for multimodal inputs
VisionBridge.fs # Q-Former cross-attention bridge for vision-to-language compression
PagedAttention.fs # Paged KV-cache attention
MemoryHierarchy.fs # Compressed memory, external retrieval
PersistentRetrievalStore.fs # Disk-backed IRetrievalStore for cross-session retrieval
ConfidenceHead.fs # Calibrated confidence MLP, Platt scaling, bucket assignment
EarlyExit.fs # Early exit / adaptive depth inference
Optimizer.fs # AdamW, SWA, Lookahead, and INT8 moment-quantized optimizers
Checkpoint.fs # Checkpoint save/load/metadata, safetensors
MmapLoading.fs # Memory-mapped model loading
GradientCheckpointing.fs # Gradient checkpointing for memory-efficient training
DistributedTraining.fs # Distributed training (PyTorch/DeepSpeed export/import)
ModelParallelism.fs # Tensor/pipeline parallelism config for 70B+ models
GpuOffloading.fs # Layer-wise CPU/GPU offloading
OptimizerOffload.fs # Optimizer state offloading
NvmePaging.fs # ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management
OnnxExport.fs # ONNX export with quantization and validation
DataMixture.fs # N-ary weighted data source mixing
VramGuard.fs # In-process VRAM monitoring with nvidia-smi polling, signal-based training loop integration
Training.fs # Training loop with AdamW/cosine LR, gradient offloading, per-param flush
FineTuningData.fs # SFT/DPO JSONL parsing, chat templates, tokenization, batching
FineTuning.fs # LoRA (LoraLinear), SFT training, DPO loss/training, adapter save/load
RewardFunctions.fs # Composable reward functions for RL training
DataValidation.fs # SFT/DPO JSONL validation, honesty pattern classification
Fp8Dequantization.fs # FP8 format dequantization with GPU LUT acceleration
Nf4Quantizer.fs # NF4/FP4 4-bit weight quantization, STE for QAT
QLoraTraining.fs # QLoRA training (NF4 base + LoRA adapters)
WeightTransfer.fs # Weight transfer from donor models (Phi-3, LLaMA3, DeepSeek mappings)
ModelMerge.fs # N-ary model merging (TIES, DARE, Karcher) with EWC protection
AttentionDistillation.fs # MLA-to-GQA attention distillation (KL + MSE loss)
Pruning.fs # Structured pruning (attention heads, layers)
CompressionPipeline.fs # Prune → finetune → quantize orchestration
ConstrainedDecoding.fs # Grammar-guided JSON constrained decoding
ChainOfThought.fs # ThinkStart/ThinkEnd token handling, phase tracking
Verifier.fs # Rule-based and learned verifier strategies
ToolPolicy.fs # Confidence-aware tool invocation policy
McpClient.fs # MCP client: connection, tool discovery, tool invocation
WebSearch.fs # Web search integration for grounded generation
ImageRouting.fs # Image query routing to Fuuga.Image
A2AClient.fs # A2A protocol client for agent-to-agent communication
TreeSpeculation.fs # Tree-structured speculative decoding
ContinuousBatching.fs # Iteration-level continuous batching for serving
Inference.fs # Text generation with sampling, tool-augmented generation, structured output, CoT
DraftAndRefine.fs # Draft-and-refine multi-pass reasoning pipeline
AdvancedReasoning.fs # Consensus voting, verifier-scored, tree-of-thoughts
Orchestrator.fs # Multi-model orchestrator with capability-based routing
BackendClient.fs # HTTP client for external OpenAI-compatible LLM endpoints
ExperienceStore.fs # Append-only experience log, strategy lessons, managed store
CostAwareRouting.fs # Cost-aware routing with budget tracking
AgentSession.fs # Agent session lifecycle, save/load, orchestration checkpoints
FanOutOrchestration.fs # Fan-out/fan-in task decomposition and aggregation
ContextAwareness.fs # Convention file discovery, language/framework detection, git context
SemanticKnowledge.fs # WordNet WNDB parser, token-to-synset mapping, multi-lingual
DataAugmentation.fs # Synonym replacement, paraphrasing, token noise, oversampling
VerifierRuntime.fs # Verifier loading and runtime integration
Eval.fs # Benchmark datasets, runners, result serialization
Program.fs # CLI entry point with subcommand routing
Fuuga.Server/ # OpenAI-compatible HTTP server (separate project)
ApiTypes.fs # Request/response types (OpenAI-compatible)
McpToolRouting.fs # Server-side MCP tool routing for function calling
FuugaChatClient.fs # IChatClient adapter for Microsoft AI ecosystem
FuugaEmbeddingGenerator.fs # IEmbeddingGenerator adapter for /v1/embeddings
A2AServer.fs # A2A protocol server endpoint
GuardRails.fs # Prompt injection detection, PII masking, content filtering
DynamicBatchingServer.fs # Dynamic batching with HTTP/SSE integration
Server.fs # Oxpecker HTTP server with SSE streaming, auth middleware
Program.fs # Server entry point
Fuuga.Image/ # Standalone image generation and captioning CLI
Types.fs # Domain types, error handling (ImageError DU)
Config.fs # CLI argument parsing
ImageIO.fs # Image load/save, format conversion, validation
Diffusion.fs # Stable Diffusion model wrapper (txt2img, img2img)
Caption.fs # Phi-3.5-vision captioning (ONNX Runtime GenAI)
McpServer.fs # MCP JSON-RPC 2.0 server over stdio
Program.fs # Entry point, subcommand routing
Fuuga.Tests/ # Unit and integration tests (xUnit + FsUnit)
Fuuga.Image/Fuuga.Image.Tests/ # Image module tests
docs/ # User-facing documentation
examples/ # Runnable F# script examples
scripts/ # Training data generation and validation scripts
Documentation
- Getting Started Tutorial -- End-to-end walkthrough from build to text generation
- CLI Reference -- All subcommands, flags, defaults, and exit codes
- Configuration Reference -- Model, training, generation, and fine-tuning parameters explained
- Ecosystem Comparison -- Fuuga vs .NET ecosystem comparison
- Language Centralization -- Adding new natural languages via the KnownNaturalLanguages registry
- Fuuga.Image README -- Image generation CLI reference and MCP server mode
- Complete Pipeline Example -- F# script demonstrating the full API
- Advanced Scenarios Example -- Weight transfer, prompt/LoRA tuning, RL, evaluation, ONNX export, serving
- Image Demo Example -- F# script demonstrating image generation and captioning
Architecture
Fuuga uses a layered module architecture with strict dependency ordering enforced by F#'s compilation model:
Layer 0: Types, Logging, Observability (foundation, no dependencies)
DriftDetection, DriftAlerting (statistical drift monitoring, OTel alerts)
Layer 1: Config, Validation, Scaling (configuration, validation, heuristics)
ConfigWizard (hardware-aware config generation)
Layer 2: Tokenizer, ParquetIO (BPE training/loading, Parquet dataset I/O)
Layer 3: Ingest, CorpusCompression (document discovery, corpus writing, Zstd compression)
Layer 4: Tensor, MultiResolutionAttention (device selection, chunk pooling + global tokens)
Model, AttentionConfig, AutoConfig (transformer with MLA, FlashAttention, auto-resolution)
Vision, VisionBridge (vision encoder, Q-Former bridge)
PagedAttention, MemoryHierarchy (paged KV-cache, compressed memory)
PersistentRetrievalStore (disk-backed retrieval for cross-session use)
ConfidenceHead, EarlyExit (calibration, adaptive depth)
Layer 5: Optimizer, Checkpoint, MmapLoading (optimizer variants incl. INT8 moments, save/load/metadata, mmap loading)
GradientCheckpointing (activation recomputation)
DistributedTraining, ModelParallelism (PyTorch/DeepSpeed, tensor/pipeline parallel)
GpuOffloading, OptimizerOffload (CPU/GPU memory management)
NvmePaging, OnnxExport (NVMe paging, ONNX export with quantization)
DataMixture, VramGuard (data mixing, in-process VRAM monitoring)
Layer 6: Training, FineTuningData, FineTuning (training loop, SFT/DPO data, LoRA training)
RewardFunctions, DataValidation (composable RL rewards, JSONL validation)
Fp8Dequantization, Nf4Quantizer (FP8/NF4 quantization support, GPU LUT, STE for QAT)
QLoraTraining (QLoRA: NF4 base + LoRA adapters)
WeightTransfer, ModelMerge (donor model transfer, N-ary merging)
AttentionDistillation (MLA-to-GQA distillation)
Pruning, CompressionPipeline (structured pruning, prune→finetune→quantize)
Layer 7: ConstrainedDecoding, ChainOfThought (generation extensions)
Verifier, ToolPolicy, McpClient (verifier strategies, tool policy, tool calling)
WebSearch, ImageRouting, A2AClient (search, image routing, A2A protocol)
TreeSpeculation, ContinuousBatching (speculation, batch scheduling)
Inference (text generation with sampling, tools, structured output)
DraftAndRefine, AdvancedReasoning (multi-pass reasoning, consensus/tree-of-thoughts)
Orchestrator, BackendClient (multi-model routing, external LLM client)
ExperienceStore, CostAwareRouting (cross-session persistence, cost optimization)
AgentSession, FanOutOrchestration (agent lifecycle, parallel task decomposition)
Layer 8: ContextAwareness, SemanticKnowledge (project context, WordNet)
DataAugmentation (synonym replacement, paraphrasing, token noise)
VerifierRuntime (verifier loading and runtime integration)
Eval (benchmark datasets, runners, result serialization)
Layer 9: Program (CLI entry point, subcommand routing)
More about design decisions can be read from the architecture document.
Running Tests
dotnet test Fuuga.Tests
Tests cover tokenization, ingestion, Parquet I/O, corpus compression, model architecture, MLA attention, vision encoder, vision bridge, paged attention, multi-resolution attention, attention configuration, early exit, training, optimizer variants, INT8 optimizer moments, gradient checkpointing, GPU offloading, optimizer offloading, ONNX export, inference, verifier-guided generation, speculative decoding, checkpoints, memory-mapped loading, fine-tuning, prompt tuning, QLoRA, reinforcement learning, reward functions, data validation, data augmentation, benchmark evaluation, FP8 dequantization, FP8 GPU LUT dequantization, NF4 quantization, STE quantization-aware training, structured pruning, compression pipeline, constrained decoding, chain-of-thought, MCP client, A2A client/server, web search, image routing, draft-and-refine, advanced reasoning, orchestrator, backend client, cost-aware routing, fan-out orchestration, weight transfer, attention distillation, model merging, distributed training, model parallelism, scaling, validation, continuous batching, dynamic batching server, experience persistence, agent session management, persistent retrieval, drift detection, drift alerting, context awareness, semantic knowledge, observability, guard rails, memory strategy (gradient offloading, per-param flush, VRAM guard lifecycle, ConfigWizard recommendations, multi-GPU parallelism), and CLI integration. Tests use xUnit with FsUnit assertions and include both unit tests and end-to-end integration tests.
Technology Stack
| Component | Library | Purpose |
|---|---|---|
| Tensors & GPU | TorchSharp 0.106.0 | Tensor operations, CUDA support |
| LibTorch | libtorch-cpu 2.10.0 | LibTorch CPU backend |
| Tokenization | Microsoft.ML.Tokenizers 2.0.0 | BPE tokenizer training |
| ONNX Runtime | Microsoft.ML.OnnxRuntime 1.24.3 | ONNX model inference |
| ONNX Export | OnnxSharp 0.3.2 | ONNX model construction and manipulation |
| Protobuf | Google.Protobuf 3.34.0 | Protobuf serialization for ONNX |
| Epub parsing | VersOne.Epub 3.3.4 | Extract text from epub files |
| Markdown | Markdig 1.1.1 | Parse markdown to plain text |
| Parquet | Parquet.Net 5.5.0 | HuggingFace-compatible dataset I/O |
| Compression | ZstdSharp.Port 0.8.7 | Zstandard corpus compression |
| Statistics | MathNet.Numerics.FSharp 5.0.0 | Drift detection (KS test, PSI) |
| Logging | Serilog 4.3.0 | Structured logging |
| Logging sinks | Serilog.Sinks.Console 6.0.0, .File 6.0.0, .OpenTelemetry 4.2.0 | Console, file, and OTel log sinks |
| Logging bridge | Serilog.Extensions.Logging 9.0.0 | Serilog/Microsoft.Extensions.Logging bridge |
| JSON | FSharp.SystemTextJson 1.4.36 | F# DU-aware serialization |
| Telemetry | OpenTelemetry 1.11.2 | Distributed tracing and metrics |
| Telemetry export | OpenTelemetry.Exporter.OpenTelemetryProtocol 1.11.2 | OTLP protocol export |
| Telemetry hosting | OpenTelemetry.Extensions.Hosting 1.11.2 | OpenTelemetry hosting integration |
| Terminal UI | Spectre.Console 0.49.1 | Rich terminal output |
| HuggingFace | TorchSharp.PyBridge 1.4.3 | HuggingFace weight loading |
| SIMD | System.Numerics.Tensors 10.0.5 | Preprocessing acceleration |
| AI abstractions | Microsoft.Extensions.AI 10.4.0 | IChatClient adapter |
| AI evaluation | Microsoft.Extensions.AI.Evaluation 10.4.0 | Benchmark evaluation framework |
| MCP | ModelContextProtocol 1.1.0 | MCP client SDK for tool calling |
| A2A | A2A 0.3.3-preview | Agent-to-Agent protocol client |
| Testing | xUnit 2.9.3 + FsUnit.xUnit 6.0.1 | Unit and integration tests |
| Server: HTTP | Oxpecker 2.0.0 | F# HTTP server framework |
| Server: A2A | A2A.AspNetCore 0.3.3-preview | A2A protocol server |
| Image: Diffusion | StableDiffusion.NET 5.0.0 | Stable Diffusion model wrapper |
| Image: Captioning | Microsoft.ML.OnnxRuntimeGenAI 0.12.1 | Phi-3.5-vision captioning |
| Image: Processing | HPPH.SkiaSharp 1.0.0 | Image load/save, format conversion |
License
See LICENSE for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- A2A (>= 0.3.3-preview)
- FSharp.Core (>= 10.1.201)
- FSharp.SystemTextJson (>= 1.4.36)
- Google.Protobuf (>= 3.34.1)
- libtorch-cpu (>= 2.10.0)
- Markdig (>= 1.1.1)
- MathNet.Numerics.FSharp (>= 5.0.0)
- Microsoft.Extensions.AI (>= 10.4.1)
- Microsoft.Extensions.AI.Evaluation (>= 10.4.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.4.0)
- Microsoft.Extensions.AI.Evaluation.Reporting (>= 10.4.0)
- Microsoft.ML.OnnxRuntime (>= 1.24.4)
- Microsoft.ML.Tokenizers (>= 2.0.0)
- ModelContextProtocol (>= 1.1.0)
- OnnxSharp (>= 0.3.2)
- OpenTelemetry (>= 1.15.0)
- OpenTelemetry.Exporter.OpenTelemetryProtocol (>= 1.15.0)
- OpenTelemetry.Extensions.Hosting (>= 1.15.0)
- Parquet.Net (>= 5.5.0)
- Serilog (>= 4.3.1)
- Serilog.Extensions.Logging (>= 10.0.0)
- Serilog.Sinks.Console (>= 6.1.1)
- Serilog.Sinks.File (>= 7.0.0)
- Serilog.Sinks.OpenTelemetry (>= 4.2.0)
- Spectre.Console (>= 0.54.0)
- System.Numerics.Tensors (>= 10.0.5)
- TorchSharp (>= 0.106.0)
- TorchSharp.PyBridge (>= 1.4.3)
- VersOne.Epub (>= 3.3.6)
- ZstdSharp.Port (>= 0.8.7)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.