RAGSharp 1.0.0

dotnet add package RAGSharp --version 1.0.0
                    
NuGet\Install-Package RAGSharp -Version 1.0.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="RAGSharp" Version="1.0.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="RAGSharp" Version="1.0.0" />
                    
Directory.Packages.props
<PackageReference Include="RAGSharp" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add RAGSharp --version 1.0.0
                    
#r "nuget: RAGSharp, 1.0.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package RAGSharp@1.0.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=RAGSharp&version=1.0.0
                    
Install as a Cake Addin
#tool nuget:?package=RAGSharp&version=1.0.0
                    
Install as a Cake Tool

RAGSharp: Lightweight RAG Pipeline for .NET

RAGSharp is a lightweight, extensible Retrieval-Augmented Generation (RAG) library built entirely in C#. It provides clean, minimal, and predictable components for building RAG pipelines: document loading, token-aware text splitting, vector storage, and semantic search.

πŸš€ Why RAGSharp?

Choose RAGSharp if you want:

  • Just the RAG essentials - load, chunk, embed, search
  • Local-first - Works with any OpenAI-compatible API (OpenAI, LM Studio, Ollama, vLLM, etc.) out of the box.
  • No database required - File-based storage out of the box, extensible to Postgres/Qdrant/etc.
  • Simple and readable - Straightforward interfaces you can understand quickly

RAGSharp doesn’t aim to replace frameworks like Semantic Kernel or Kernel Memory β€” instead, it is a minimal, local-friendly pipeline you can understand in one sitting and drop into an existing app.

πŸ“¦ Installation

dotnet add package RAGSharp

Dependencies:

  • OpenAI - Official SDK for embeddings
  • SharpToken - GPT tokenization for accurate chunking
  • HtmlAgilityPack - HTML parsing for web loaders
  • Microsoft.Extensions.Logging - Logging abstractions

✨ Quick Start

Load documents, index them, and search - in under 15 lines:

using RAGSharp.Embeddings.Providers;
using RAGSharp.IO;
using RAGSharp.RAG;
using RAGSharp.Stores;

//load doc
var docs = await new FileLoader().LoadAsync("sample.txt");

var retriever = new RagRetriever(
    embeddings: new OpenAIEmbeddingClient(
        baseUrl: "http://127.0.0.1:1234/v1",
        apiKey: "lmstudio",
        defaultModel: "text-embedding-3-small"
    ),
    store: new InMemoryVectorStore()
);

//ingest
await retriever.AddDocumentsAsync(docs);

//search
var results = await retriever.Search("quantum mechanics",  topK: 3);
foreach (var r in results)
    Console.WriteLine($"{r.Score:F2}: {r.Content}");

This example uses a local embedding model hosted by (LM Studio) and an in-memory vector store.

βš™οΈ Architecture.

Every RAG pipeline in RAGSharp follows this flow. Each part is pluggable:

   [ DocumentLoader ] β†’ [ TextSplitter ] β†’ [ Embeddings ] β†’ [ VectorStore ] β†’ [ Retriever ]

That’s it β€” the essential building blocks for RAG, without the noise.

πŸ”Œ Extensibility

Built on simple interfaces.

For Embeddings:

public interface IEmbeddingClient
{
    Task<float[]> GetEmbeddingAsync(string text);
    Task<IReadOnlyList<float[]>> GetEmbeddingsAsync(IEnumerable<string> texts);
}

For Vector Store:

public interface IVectorStore
{
    Task AddAsync(VectorRecord item);
    Task AddBatchAsync(IEnumerable<VectorRecord> items);
    Task<IReadOnlyList<SearchResult>> SearchAsync(float[] query, int topK);
    bool Contains(string id);
}

No vendor lock-in. Want Claude or Gemini embeddings? Cohere? A Postgres or Qdrant backend? Just implement the interface and drop it in.

πŸ“š Core Components

1. Document Loading (RAGSharp.IO)

Loaders fetch raw data and convert it into a list of Document objects.

Loader Description
FileLoader Load a single text file.
DirectoryLoader Load all files matching a pattern from a directory.
UrlLoader Scrapes a web page (cleans HTML, removes noise) and extracts plain text.
WebSearchLoader Searches and loads content directly from Wikipedia articles.

Custom loaders: Implement IDocumentLoader for PDFs, Word docs, databases, etc.

2. Embeddings & Tokenization (RAGSharp.Embeddings)

Component Description
OpenAIEmbeddingClient Included. Works with any OpenAI-compatible API
IEmbeddingClient 2-method interface for custom providers (Claude, Gemini, Cohere, etc.)
SharpTokenTokenizer Included. GPT-family token counting
ITokenizer Interface for custom tokenizers (Uses SharpToken)

3. Text Splitting (RAGSharp.Text)

RecursiveTextSplitter - Token-aware recursive splitter that respects semantic boundaries:

var tokenizer = new SharpTokenTokenizer("gpt-4");
var splitter = new RecursiveTextSplitter(
    tokenizer, 
    chunkSize: 512,    // tokens
    chunkOverlap: 50
);

Here’s how it works:

- Input Text
    ↓
Split by paragraphs (\n\n)
    ↓
For each paragraph:
    β”œβ”€ Fits in chunk size? β†’ Yield whole paragraph
    └─ Too large? β†’ Split by sentences
                      ↓
                For each sentence:
                    β”œβ”€ Buffer + sentence fits? β†’ Add to buffer
                    └─ Too large?
                         β”œβ”€ Yield buffer
                         └─ Single sentence too large? β†’ Token window split
                                                       └─ Sliding window with overlap
  • Splits by paragraphs first
  • Falls back to sentences if chunks too large
  • Final fallback: sliding token window with overlap
  • Uses SharpTokenTokenizer for token counting

Implement ITextSplitter for custom splitting strategies.

4. Vector Stores (RAGSharp.Stores)

Store Use Case
IVectorStore 4-method interface for any database (Postgres, Qdrant, Redis, etc.)
InMemoryVectorStore Fast, ephemeral storage for demos, testing, and temporary searches.
FileVectorStore Persistent, file-backed storage (uses JSON), perfect for retaining your indexed knowledge base between runs.
Why no built-in database support?
  • Many use cases don't need a database (small knowledge bases, temporary searches)
  • Database choices vary widely (Postgres, Qdrant, Redis, SQLite etc)
  • Implement IVectorStore in ~50 lines for your preferred database
  • Keeps core library lightweight

5. Retriever (RAGSharp.RAG)

The RagRetriever orchestrates the entire pipeline:

  • Accepts new documents.
  • Chunks them using ITextSplitter.
  • Sends chunks to the IEmbeddingClient.
  • Stores the resulting vectors in the IVectorStore.
  • Performs semantic search (dot product/cosine similarity) against the store based on a query.
  • Optional: Pass an ILogger to track ingestion progress, by defualt it uses ConsoleLogger

πŸ“š Examples & Documentation

Clone the repo and run the sample app.

git clone https://github.com/mrrazor22/ragsharp
cd ragsharp/SampleApp
dotnet run
Example Description
Example1_QuickStart The basic 4-step process: Load File β†’ Init Retriever β†’ Add Docs β†’ Search.
Example2_FilesSearch Loading multiple documents from a directory with persistent storage.
Example3_WebDocSearch Loading content from the web (UrlLoader and WebSearchLoader).
Example4_Barebones Using the low-level API: manual embedding and cosine similarity.
Example5_Advanced Full pipeline customization, injecting custom splitter, logger, and persistent store.

πŸ“‚ Project Structure

RAGSharp/
β”œβ”€β”€ Embeddings/        (IEmbeddingClient, ITokenizer, providers)
β”œβ”€β”€ IO/                (FileLoader, DirectoryLoader, UrlLoader, WebSearchLoader)
β”œβ”€β”€ RAG/               (RagRetriever, IRagRetriever)
β”œβ”€β”€ Stores/            (InMemoryVectorStore, FileVectorStore)
β”œβ”€β”€ Text/              (RecursiveTextSplitter, ITextSplitter)
β”œβ”€β”€ Logging/           (ConsoleLogger)
└── Utils/             (HashingHelper, VectorExtensions)

SampleApp/
β”œβ”€β”€ data/              (sample.txt, documents/)
└── Examples/          (Example1..Example5)

πŸ“œ License

MIT License.

πŸ’‘ Status

Early stage, suitable for prototypes and production apps where simplicity matters.

Product Compatible and additional computed target framework versions.
.NET net5.0 was computed.  net5.0-windows was computed.  net6.0 was computed.  net6.0-android was computed.  net6.0-ios was computed.  net6.0-maccatalyst was computed.  net6.0-macos was computed.  net6.0-tvos was computed.  net6.0-windows was computed.  net7.0 was computed.  net7.0-android was computed.  net7.0-ios was computed.  net7.0-maccatalyst was computed.  net7.0-macos was computed.  net7.0-tvos was computed.  net7.0-windows was computed.  net8.0 was computed.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
.NET Core netcoreapp2.0 was computed.  netcoreapp2.1 was computed.  netcoreapp2.2 was computed.  netcoreapp3.0 was computed.  netcoreapp3.1 was computed. 
.NET Standard netstandard2.0 is compatible.  netstandard2.1 was computed. 
.NET Framework net461 was computed.  net462 was computed.  net463 was computed.  net47 was computed.  net471 was computed.  net472 was computed.  net48 was computed.  net481 was computed. 
MonoAndroid monoandroid was computed. 
MonoMac monomac was computed. 
MonoTouch monotouch was computed. 
Tizen tizen40 was computed.  tizen60 was computed. 
Xamarin.iOS xamarinios was computed. 
Xamarin.Mac xamarinmac was computed. 
Xamarin.TVOS xamarintvos was computed. 
Xamarin.WatchOS xamarinwatchos was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.0.0 177 10/5/2025