RAGamuffin 1.0.8

.NET 8.0

dotnet add package RAGamuffin --version 1.0.8

NuGet\Install-Package RAGamuffin -Version 1.0.8

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="RAGamuffin" Version="1.0.8" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="RAGamuffin" Version="1.0.8" />
                    

                            Directory.Packages.props

<PackageReference Include="RAGamuffin" />
                    

                            Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add RAGamuffin --version 1.0.8

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: RAGamuffin, 1.0.8"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#addin nuget:?package=RAGamuffin&version=1.0.8
                    

                            Install as a Cake Addin

#tool nuget:?package=RAGamuffin&version=1.0.8
                    

                            Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage.

🚀 Features

Local Embedding Models: Use ONNX models for offline, privacy-focused embeddings
SQLite Vector Storage: Lightweight, file-based vector database with no external dependencies
Multi-Format Support: Process PDFs and text files with intelligent chunking
Flexible Training Strategies: Retrain from scratch, incremental updates, or add-only modes
Real-time Ingestion: Stream text content directly into your vector store
Metadata Preservation: Maintain document context and metadata throughout the pipeline
Cross-Platform: Works on Windows, macOS, and Linux with .NET 8.0+

🎯 Quick Start

Installation

dotnet add package RAGamuffin

Basic Usage

using RAGamuffin.Builders;
using RAGamuffin.Core;
using RAGamuffin.Embedding;
using RAGamuffin.Enums;

// 1. Set up your embedding model (download from HuggingFace)
var embedder = new OnnxEmbedder("path/to/model.onnx", "path/to/tokenizer.json");

// 2. Configure your vector database
var vectorDb = new SqliteDatabaseModel("documents.db", "my_collection");

// 3. Build and train your pipeline
var pipeline = new IngestionTrainingBuilder()
    .WithEmbeddingModel(embedder)
    .WithVectorDatabase(vectorDb)
    .WithTrainingStrategy(TrainingStrategy.RetrainFromScratch)
    .WithTrainingFiles(new[] { "document.pdf" })
    .Build();

var ingestedItems = await pipeline.Train();

// 4. Search your documents
string[] results = await pipeline.SearchAndReturnTexts("What is the company policy?", 5);

Real-time Text Ingestion

// Stream text content directly into your vector store
var textItems = new[]
{
    new TextItem("Meeting notes from Q1", "Q1 was successful with 15% growth..."),
    new TextItem("Product roadmap", "Next quarter we'll launch feature X...")
};

var (ingestedItems, model) = await pipeline.TrainWithText(textItems);

Search Existing Vector Store

// Search without retraining
var vectorStore = new SqliteVectorStoreProvider("documents.db", "my_collection");
var searchResults = await vectorStore.SearchAsync("your query", embedder, 5);

// Get metadata
var metadata = await vectorStore.GetAllDocumentsMetadataAsync();

📚 Examples

Check out the comprehensive examples in the Examples/ directory:

TrainAndSearch: Complete RAG pipeline with training and search
SearchExistingVectorStore: Query existing vector stores with metadata
IncrementalTraining: Add new documents to existing collections
RealTimeIngestion: Stream text content in real-time
MetadataRetrieval: Work with document metadata and statistics

🔧 Configuration

Embedding Models

RAGamuffin supports ONNX models for cross-platform compatibility. Recommended starter model:

Model: all-mpnet-base-v2 from HuggingFace
Download: Model | Tokenizer

Training Strategies

RetrainFromScratch: Drop all existing data and retrain
IncrementalAdd: Add new documents (skip if exists)
IncrementalUpdate: Add new documents and update existing ones
ProcessOnly: Only process documents, no vector operations

Chunking Options

// PDF processing options
.WithPdfOptions(new PdfHybridParagraphIngestionOptions
{
    MinSize = 0,        // Minimum chunk size
    MaxSize = 800,      // Maximum chunk size
    Overlap = 400,      // Overlap between chunks
    UseMetadata = true  // Include document metadata
})

// Text processing options
.WithTextOptions(new TextHybridParagraphIngestionOptions
{
    MinSize = 500,      // Minimum chunk size
    MaxSize = 800,      // Maximum chunk size
    Overlap = 400,      // Overlap between chunks
    UseMetadata = true  // Include document metadata
})

🏗️ Architecture

RAGamuffin is built with a modular architecture:

Abstractions: Clean interfaces for embedding, ingestion, and vector storage
Core: Main pipeline logic and data models
Embedding: ONNX-based embedding providers
Ingestion: PDF and text processing engines
VectorStores: SQLite vector database implementation
Builders: Fluent API for pipeline configuration

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE.txt file for details.

InstructSharp: LLM client library for .NET
PdfPig: PDF processing library
Microsoft.ML.OnnxRuntime: ONNX model inference

RAGamuffin - Making RAG pipelines simple and accessible for .NET developers.

Product	Compatible and additional computed target framework versions.
.NET	net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net8.0
- Microsoft.Data.Sqlite (>= 9.0.6)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.6)
- Microsoft.Extensions.VectorData.Abstractions (>= 9.6.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.0)
- Microsoft.ML.OnnxRuntime.Extensions (>= 0.14.0)
- Microsoft.SemanticKernel.Connectors.SqliteVec (>= 1.58.0-preview)
- PdfPig (>= 0.1.10)
- SQLitePCLRaw.bundle_e_sqlite3 (>= 2.1.11)
- Tokenizers.DotNet (>= 1.2.0)
- Tokenizers.DotNet.runtime.win-x64 (>= 1.2.0)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.8	37	7/8/2025
1.0.7	76	6/28/2025
1.0.6	86	6/28/2025
1.0.5	81	6/28/2025
1.0.3	86	6/28/2025
1.0.2	86	6/28/2025
1.0.1	85	6/28/2025
1.0.0	88	6/28/2025