Caveman 1.0.2
dotnet add package Caveman --version 1.0.2
NuGet\Install-Package Caveman -Version 1.0.2
<PackageReference Include="Caveman" Version="1.0.2" />
<PackageVersion Include="Caveman" Version="1.0.2" />
<PackageReference Include="Caveman" />
paket add Caveman --version 1.0.2
#r "nuget: Caveman, 1.0.2"
#:package Caveman@1.0.2
#addin nuget:?package=Caveman&version=1.0.2
#tool nuget:?package=Caveman&version=1.0.2
#ðĶī Caveman: Prompt Compressor for LLMs
<img width="1197" height="766" alt="caveman_splash" src="https://github.com/user-attachments/assets/4b534140-c519-423f-b918-e705565a039f" /> It is the version that is inspired by the token saving algorithm of Caveman plugin for Claude, but it was conceived without doing any porting from the original, it is a code born from scratch
Caveman is a C# library built on Catalyst that drastically reduces the number of tokens in your LLM prompts (such as Gemma 3, Llama, or GPT-4). It utilizes Natural Language Processing (NLP) techniques to remove grammatical "noise" (articles, prepositions, conjunctions) while keeping the semantic value intact.
"Why use many tokens when few tokens do trick?" â A caveman (and your wallet).
ð Features
- Up to 70% Token Reduction: Slash API costs and speed up local inference.
- Multilingual: Support for over 50 languages (English, Italian, French, etc.) via Catalyst models.
- Compression Levels: Choose between
Light,Semantic, orAggressive(Lemmatization). - LLM Integration with Semantic Kernel: Optimized for next-gen models that perfectly understand contracted language.
ð ïļ Installation
Base Package
Install the core library and the model manager:
dotnet add package Catalyst dotnet add package Mosaik.Core
Language Models
Install the packages for the languages you intend to support: dotnet add package Catalyst.Models.English dotnet add package Catalyst.Models.Italian
Alternatively, run the PowerShell script Install-CatalystModels.ps1 (it automatically updates all libraries in the project).
Quick Start
var compressor = new CavemanCompressionService(); string input = "I would like to know if it is possible to receive information about cheap restaurants in Rome.";
// Compresses the text and calculates energy savings var result = await compressor.CompressAsync(input, CavemanCompressionLevel.Semantic);
Console.WriteLine($"Compressed: {result.CompressedText}"); Console.WriteLine($"Efficiency: {result.EfficiencyPercentage:F1}%"); Console.WriteLine($"ðŋ Energy Saved: {result.EstimatedEnergySavedMWh:F3} mWh");
ðŋ Sustainability: Why it matters
Every token generated or processed by an LLM has an environmental cost. Caveman v1.1 introduces a built-in estimator based on industry averages:
Energy Consumption: Estimated at 5 mWh per token.
Carbon Footprint: Estimated at 0.4 mg of CO2 per mWh.
By compressing a prompt from 1000 to 400 tokens, you save approximately 3 mWh of energy. On a scale of millions of requests, Caveman helps build a more sustainable AI ecosystem.
ð NLP Compression Levels
| Level | Applied Logic | Removed POS Tags (Filters) | Savings |
|---|---|---|---|
| Light | Stopword Removal | DET, ADP, CCONJ, SCONJ, PRON, PUNCT |
~25-30% |
| Semantic | Key Content Selection | Keeps only NOUN, VERB, ADJ, PROPN, ADV |
~50% |
| Aggressive | Lemmatization | Keeps only NOUN, VERB, PROPN (base form) |
~70% |
ð Technical Tag Detail (Catalyst Mapping)
| POS Tag | Category | Examples (ENG/ITA) | Treatment |
|---|---|---|---|
| DET | Determiners | the, a / il, lo | Removed (from Light) |
| ADP | Prepositions | of, at / di, a | Removed (from Light) |
| CCONJ | Coord. Conjunctions | and, or / e, o | Removed (from Light) |
| SCONJ | Subord. Conjunctions | that, if / che, se | Removed (from Light) |
| PRON | Pronouns | I, you / io, tu | Removed (from Light) |
| NOUN | Nouns | house, pizza / casa, pizza | Always Kept |
| VERB | Verbs | eat, runs / mangiare, corre | Always Kept |
| ADV | Adverbs | not, quickly / non, molto | Kept in Semantic |
ðĄ Transformation Example
| State | Prompt Text | Tokens / Characters |
|---|---|---|
| Original | "I would like to know if it is possible to have a margherita pizza immediately." | 100% (78 ch) |
| Light | "like know possible have margherita pizza immediately" | ~70% (54 ch) |
| Semantic | "know possible have margherita pizza immediately" | ~55% (48 ch) |
| Aggressive | "know possible have margherita pizza" | ~40% (38 ch) |
ðĄ This is a new feature introduced in version 1.0.2 : Caveman.Wiki
Purpose
Automatically generate AI-friendly markdown documentation for any software project, semantically compressing content to optimize context for LLM prompts.
How It Works
Project Analysis: Automatically detects project type (C#, Python, Node.js, etc.) by scanning configuration files (.csproj, requirements.txt, package.json, etc.)
File Scanning: Recursively traverses the folder, applying intelligent filters to exclude binary files, build folders, and external dependencies.
Dependency Extraction: Parses project files to extract packages and versions, organizing them by source (NuGet, PyPI, npm, etc.)
Content Compression: For files >2KB, uses
CavemanCompressionServicewithSemanticlevel to reduce token count while preserving meaning.Markdown Output: Generates a structured document with:
- Project metadata in YAML format
- Organized dependency list
- Tree view of file structure
- File contents with syntax highlighting
- Statistical summary
Benefits for AI
â
Complete context in readable format
â
Token-optimized via semantic compression
â
Predictable structure for automatic parsing
â
Machine-readable metadata for RAG systems
Example Usage
// Basic usage var wiki = new CavemanWiki(); string context = await wiki.GenerateAsync(@"C:\Users\Dev\MyAwesomeProject"); await File.WriteAllTextAsync("AI_CONTEXT.md", context);
// Advanced usage with custom parameters string context = await wiki.GenerateAsync( projectFolderPath: @"..\MyProject", maxFileSizeBytes: 50 * 1024, // 50KB max per file compressionLevel: CavemanCompressionLevel.Aggressive // More aggressive compression );
// Integration with AI prompt system var prompt = $@" You are an expert assistant for the project described below.
<project_context> {context} </project_context>
Answer questions based SOLELY on this context. "; ðĪ Contributing
Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Catalyst (>= 26.3.60)
- Catalyst.Models.Afrikaans (>= 1.0.30952)
- Catalyst.Models.Arabic (>= 1.0.30952)
- Catalyst.Models.Armenian (>= 1.0.30952)
- Catalyst.Models.Basque (>= 1.0.30952)
- Catalyst.Models.Belarusian (>= 1.0.30952)
- Catalyst.Models.Bulgarian (>= 1.0.30952)
- Catalyst.Models.Catalan (>= 1.0.30952)
- Catalyst.Models.Chinese (>= 1.0.30952)
- Catalyst.Models.Croatian (>= 1.0.30952)
- Catalyst.Models.Czech (>= 1.0.30952)
- Catalyst.Models.Danish (>= 1.0.30952)
- Catalyst.Models.Dutch (>= 1.0.30952)
- Catalyst.Models.English (>= 1.0.30952)
- Catalyst.Models.Estonian (>= 1.0.30952)
- Catalyst.Models.Finnish (>= 1.0.30952)
- Catalyst.Models.French (>= 1.0.30952)
- Catalyst.Models.Galician (>= 1.0.30952)
- Catalyst.Models.German (>= 1.0.30952)
- Catalyst.Models.Hebrew (>= 1.0.30952)
- Catalyst.Models.Hindi (>= 1.0.30952)
- Catalyst.Models.Hungarian (>= 1.0.30952)
- Catalyst.Models.Icelandic (>= 1.0.30952)
- Catalyst.Models.Indonesian (>= 1.0.30952)
- Catalyst.Models.Irish (>= 1.0.30952)
- Catalyst.Models.Italian (>= 1.0.30952)
- Catalyst.Models.Japanese (>= 1.0.30952)
- Catalyst.Models.Kazakh (>= 1.0.30952)
- Catalyst.Models.Korean (>= 1.0.30952)
- Catalyst.Models.Latin (>= 1.0.30952)
- Catalyst.Models.Latvian (>= 1.0.30952)
- Catalyst.Models.Lithuanian (>= 1.0.30952)
- Catalyst.Models.Macedonian (>= 1.0.30952)
- Catalyst.Models.Marathi (>= 1.0.30952)
- Catalyst.Models.Norwegian (>= 1.0.30952)
- Catalyst.Models.Persian (>= 1.0.30952)
- Catalyst.Models.Polish (>= 1.0.30952)
- Catalyst.Models.Portuguese (>= 1.0.30952)
- Catalyst.Models.Romanian (>= 1.0.30952)
- Catalyst.Models.Russian (>= 1.0.30952)
- Catalyst.Models.Serbian (>= 1.0.30952)
- Catalyst.Models.Slovak (>= 1.0.30952)
- Catalyst.Models.Slovenian (>= 1.0.30952)
- Catalyst.Models.Spanish (>= 1.0.30952)
- Catalyst.Models.Swedish (>= 1.0.30952)
- Catalyst.Models.Tamil (>= 1.0.30952)
- Catalyst.Models.Telugu (>= 1.0.30952)
- Catalyst.Models.Turkish (>= 1.0.30952)
- Catalyst.Models.Ukrainian (>= 1.0.30952)
- Catalyst.Models.Urdu (>= 1.0.30952)
- Catalyst.Models.Vietnamese (>= 1.0.30952)
- LanguageDetection.NETStandard (>= 1.3.1)
- Microsoft.SemanticKernel (>= 1.74.0)
- Mosaik.Core (>= 25.10.62118)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.