AgentEval 0.6.0-beta
dotnet add package AgentEval --version 0.6.0-beta
NuGet\Install-Package AgentEval -Version 0.6.0-beta
<PackageReference Include="AgentEval" Version="0.6.0-beta" />
<PackageVersion Include="AgentEval" Version="0.6.0-beta" />
<PackageReference Include="AgentEval" />
paket add AgentEval --version 0.6.0-beta
#r "nuget: AgentEval, 0.6.0-beta"
#:package AgentEval@0.6.0-beta
#addin nuget:?package=AgentEval&version=0.6.0-beta&prerelease
#tool nuget:?package=AgentEval&version=0.6.0-beta&prerelease
AgentEval
The .NET Evaluation Toolkit for AI Agents
Built first for Microsoft Agent Framework (MAF) and Microsoft.Extensions.AI. What RAGAS and DeepEval do for Python, AgentEval does for .NET.
Features
- π― Tool Tracking β Monitor tool/function calls with timing, arguments, and ordering
- β
Fluent Assertions β Expressive assertions with rich failure messages,
becausereasons, and assertion scopes - π Performance Metrics β TTFT, latency, tokens, cost estimation for 8+ models
- π¬ RAG Metrics β Faithfulness, relevance, context precision/recall, answer correctness
- π‘οΈ Red Team Security β 9 attack types, 192 probes, OWASP LLM Top 10 coverage
- βοΈ Responsible AI β Toxicity, bias, and misinformation detection metrics
- π Stochastic Evaluation β Statistical model comparison with multi-run analysis
- π Trace Record & Replay β Deterministic CI testing without LLM calls
- π― Calibrated Evaluator β Multi-model consensus-driven scoring
- π Extensible β Adapter pattern for any agent framework
Quick Start
using AgentEval;
using AgentEval.MAF;
using AgentEval.Assertions;
// Create evaluation harness
var harness = new MAFEvaluationHarness(evaluatorClient);
// Run evaluation with tool tracking
var result = await harness.RunEvaluationAsync(agent, new TestCase
{
Name = "Feature Planning Test",
Input = "Plan a user authentication feature",
EvaluationCriteria = ["Should include security considerations"]
});
// Assert tool usage with "because" reasons
result.ToolUsage!
.Should()
.HaveCalledTool("SecurityTool", because: "auth features require security review")
.BeforeTool("FeatureTool")
.WithoutError()
.And()
.HaveNoErrors();
// Assert performance
result.Performance!
.Should()
.HaveTotalDurationUnder(TimeSpan.FromSeconds(10))
.HaveEstimatedCostUnder(0.10m);
Red Team Security Scanning
var result = await AttackPipeline.Create()
.WithAllAttacks()
.ScanAsync(agent);
result.Should().HaveOverallScoreAbove(85);
result.ExportAsync("security-report.sarif", ExportFormat.Sarif);
Trace Record & Replay
Capture agent executions for deterministic replay β no LLM calls needed in CI:
// Record
await using var recorder = new TraceRecordingAgent(realAgent, "weather_test");
var response = await recorder.InvokeAsync("What's the weather?");
await TraceSerializer.SaveToFileAsync(recorder.Trace, "trace.json");
// Replay (deterministic, free)
var trace = await TraceSerializer.LoadFromFileAsync("trace.json");
var replayer = new TraceReplayingAgent(trace);
var replayed = await replayer.InvokeAsync("What's the weather?");
Model Comparison
var result = await comparer.CompareModelsAsync(
factories: [gpt4oFactory, gpt4oMiniFactory],
testCases: testSuite,
options: new ComparisonOptions(RunsPerModel: 5));
Console.WriteLine(result.ToMarkdown());
Quality Assurance
- Comprehensive evaluation suite targeting net8.0, net9.0, and net10.0
- All evaluations passing β
Installation
dotnet add package AgentEval --prerelease
Single package, modular internals β AgentEval ships as one NuGet package containing 6 focused assemblies:
AgentEval.Abstractionsβ Public contracts and interfacesAgentEval.Coreβ Metrics, assertions, comparison, tracingAgentEval.DataLoadersβ Data loading and export (JSON, YAML, CSV, JSONL)AgentEval.MAFβ Microsoft Agent Framework integrationAgentEval.RedTeamβ Security testing (multiple attack types and probes)
Service Registration
// Register all services at once (recommended):
services.AddAgentEvalAll();
// Or register selectively:
services.AddAgentEval(); // Core services only
services.AddAgentEvalDataLoaders(); // DataLoaders + Exporters
services.AddAgentEvalRedTeam(); // Red Team security testing
Documentation
- Getting Started
- Fluent Assertions
- Metrics Reference
- Red Team Security
- Trace Record & Replay
- Stochastic Evaluation
- Architecture
License
MIT License β See LICENSE for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net10.0
- Microsoft.Agents.AI (>= 1.0.0-rc3)
- Microsoft.Agents.AI.Workflows (>= 1.0.0-rc3)
- Microsoft.Extensions.AI (>= 10.3.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.3.0)
- Microsoft.Extensions.DependencyInjection (>= 9.0.0)
- PdfSharp-MigraDoc (>= 6.2.4)
- System.Numerics.Tensors (>= 10.0.3)
- YamlDotNet (>= 16.3.0)
-
net8.0
- Microsoft.Agents.AI (>= 1.0.0-rc3)
- Microsoft.Agents.AI.Workflows (>= 1.0.0-rc3)
- Microsoft.Extensions.AI (>= 10.3.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.3.0)
- Microsoft.Extensions.DependencyInjection (>= 9.0.0)
- PdfSharp-MigraDoc (>= 6.2.4)
- System.Numerics.Tensors (>= 10.0.3)
- YamlDotNet (>= 16.3.0)
-
net9.0
- Microsoft.Agents.AI (>= 1.0.0-rc3)
- Microsoft.Agents.AI.Workflows (>= 1.0.0-rc3)
- Microsoft.Extensions.AI (>= 10.3.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.3.0)
- Microsoft.Extensions.DependencyInjection (>= 9.0.0)
- PdfSharp-MigraDoc (>= 6.2.4)
- System.Numerics.Tensors (>= 10.0.3)
- YamlDotNet (>= 16.3.0)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 0.6.0-beta | 74 | 3/5/2026 |
| 0.5.4-beta | 76 | 3/3/2026 |
| 0.5.3-beta | 97 | 3/1/2026 |
| 0.5.2-beta | 72 | 2/28/2026 |
| 0.5.1-beta | 61 | 2/28/2026 |
| 0.4.0-beta | 81 | 2/22/2026 |
| 0.3.0-beta | 118 | 1/25/2026 |
| 0.2.1-beta | 62 | 1/24/2026 |
| 0.2.0-beta | 56 | 1/18/2026 |
| 0.1.1-alpha | 69 | 1/3/2026 |
| 0.1.0-alpha | 58 | 1/3/2026 |
v0.6.0-beta: MAF 1.0.0-rc3 compatibility. Zero breaking changes to AgentEval APIs. See CHANGELOG.md for details.