RAGamuffin 1.0.8
dotnet add package RAGamuffin --version 1.0.8
NuGet\Install-Package RAGamuffin -Version 1.0.8
<PackageReference Include="RAGamuffin" Version="1.0.8" />
<PackageVersion Include="RAGamuffin" Version="1.0.8" />
<PackageReference Include="RAGamuffin" />
paket add RAGamuffin --version 1.0.8
#r "nuget: RAGamuffin, 1.0.8"
#addin nuget:?package=RAGamuffin&version=1.0.8
#tool nuget:?package=RAGamuffin&version=1.0.8
A lightweight, cross-platform .NET library for building RAG (Retrieval-Augmented Generation) pipelines with local embedding models and SQLite vector storage.
🚀 Features
- Local Embedding Models: Use ONNX models for offline, privacy-focused embeddings
- SQLite Vector Storage: Lightweight, file-based vector database with no external dependencies
- Multi-Format Support: Process PDFs and text files with intelligent chunking
- Flexible Training Strategies: Retrain from scratch, incremental updates, or add-only modes
- Real-time Ingestion: Stream text content directly into your vector store
- Metadata Preservation: Maintain document context and metadata throughout the pipeline
- Cross-Platform: Works on Windows, macOS, and Linux with .NET 8.0+
🎯 Quick Start
Installation
dotnet add package RAGamuffin
Basic Usage
using RAGamuffin.Builders;
using RAGamuffin.Core;
using RAGamuffin.Embedding;
using RAGamuffin.Enums;
// 1. Set up your embedding model (download from HuggingFace)
var embedder = new OnnxEmbedder("path/to/model.onnx", "path/to/tokenizer.json");
// 2. Configure your vector database
var vectorDb = new SqliteDatabaseModel("documents.db", "my_collection");
// 3. Build and train your pipeline
var pipeline = new IngestionTrainingBuilder()
.WithEmbeddingModel(embedder)
.WithVectorDatabase(vectorDb)
.WithTrainingStrategy(TrainingStrategy.RetrainFromScratch)
.WithTrainingFiles(new[] { "document.pdf" })
.Build();
var ingestedItems = await pipeline.Train();
// 4. Search your documents
string[] results = await pipeline.SearchAndReturnTexts("What is the company policy?", 5);
Real-time Text Ingestion
// Stream text content directly into your vector store
var textItems = new[]
{
new TextItem("Meeting notes from Q1", "Q1 was successful with 15% growth..."),
new TextItem("Product roadmap", "Next quarter we'll launch feature X...")
};
var (ingestedItems, model) = await pipeline.TrainWithText(textItems);
Search Existing Vector Store
// Search without retraining
var vectorStore = new SqliteVectorStoreProvider("documents.db", "my_collection");
var searchResults = await vectorStore.SearchAsync("your query", embedder, 5);
// Get metadata
var metadata = await vectorStore.GetAllDocumentsMetadataAsync();
📚 Examples
Check out the comprehensive examples in the Examples/
directory:
- TrainAndSearch: Complete RAG pipeline with training and search
- SearchExistingVectorStore: Query existing vector stores with metadata
- IncrementalTraining: Add new documents to existing collections
- RealTimeIngestion: Stream text content in real-time
- MetadataRetrieval: Work with document metadata and statistics
🔧 Configuration
Embedding Models
RAGamuffin supports ONNX models for cross-platform compatibility. Recommended starter model:
Training Strategies
- RetrainFromScratch: Drop all existing data and retrain
- IncrementalAdd: Add new documents (skip if exists)
- IncrementalUpdate: Add new documents and update existing ones
- ProcessOnly: Only process documents, no vector operations
Chunking Options
// PDF processing options
.WithPdfOptions(new PdfHybridParagraphIngestionOptions
{
MinSize = 0, // Minimum chunk size
MaxSize = 800, // Maximum chunk size
Overlap = 400, // Overlap between chunks
UseMetadata = true // Include document metadata
})
// Text processing options
.WithTextOptions(new TextHybridParagraphIngestionOptions
{
MinSize = 500, // Minimum chunk size
MaxSize = 800, // Maximum chunk size
Overlap = 400, // Overlap between chunks
UseMetadata = true // Include document metadata
})
🏗️ Architecture
RAGamuffin is built with a modular architecture:
- Abstractions: Clean interfaces for embedding, ingestion, and vector storage
- Core: Main pipeline logic and data models
- Embedding: ONNX-based embedding providers
- Ingestion: PDF and text processing engines
- VectorStores: SQLite vector database implementation
- Builders: Fluent API for pipeline configuration
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE.txt file for details.
🔗 Related Projects
- InstructSharp: LLM client library for .NET
- PdfPig: PDF processing library
- Microsoft.ML.OnnxRuntime: ONNX model inference
RAGamuffin - Making RAG pipelines simple and accessible for .NET developers.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Data.Sqlite (>= 9.0.6)
- Microsoft.Extensions.Logging.Abstractions (>= 9.0.6)
- Microsoft.Extensions.VectorData.Abstractions (>= 9.6.0)
- Microsoft.ML.OnnxRuntime (>= 1.22.0)
- Microsoft.ML.OnnxRuntime.Extensions (>= 0.14.0)
- Microsoft.SemanticKernel.Connectors.SqliteVec (>= 1.58.0-preview)
- PdfPig (>= 0.1.10)
- SQLitePCLRaw.bundle_e_sqlite3 (>= 2.1.11)
- Tokenizers.DotNet (>= 1.2.0)
- Tokenizers.DotNet.runtime.win-x64 (>= 1.2.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.