AgentEval.Cli 0.2.1-alpha

This is a prerelease version of AgentEval.Cli.

dotnet tool install --global AgentEval.Cli --version 0.2.1-alpha

This package contains a .NET tool you can call from the shell/command line.

dotnet new tool-manifest
                    

                            if you are setting up this repo

dotnet tool install --local AgentEval.Cli --version 0.2.1-alpha

This package contains a .NET tool you can call from the shell/command line.

#tool dotnet:?package=AgentEval.Cli&version=0.2.1-alpha&prerelease

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

nuke :add-package AgentEval.Cli --version 0.2.1-alpha

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

AgentEval CLI

Command-line interface for AgentEval — evaluate any OpenAI-compatible AI agent from the terminal.

Installation

dotnet tool install -g AgentEval.Cli --prerelease

Compatibility

AgentEval CLI	AgentEval	MAF	.NET
0.2.0-alpha	0.6.0-beta	1.0.0-rc3	9.0, 10.0
0.1.0-alpha	0.5.3-beta	1.0.0-rc2	9.0, 10.0

Quick Start

Initialize a test dataset

agenteval init
agenteval init -o my-tests.yaml
agenteval init --format json

Run evaluations

# Against Azure OpenAI
agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml

# Against OpenAI directly
agenteval eval --endpoint https://api.openai.com/v1 --model gpt-4o --dataset agenteval.yaml

# Against a local Ollama model
agenteval eval --endpoint http://localhost:11434/v1 --model llama3 --dataset agenteval.yaml

Stochastic evaluation (multi-run)

agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml --runs 5 --success-threshold 0.9

Export results

# Single file export
agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml --format json -o results.json

# Structured directory export (ADR-002 format)
agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml --format directory --output-dir results/

Red team security scanning

# Run all 9 attack types
agenteval redteam --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --intensity moderate

# Run specific attacks
agenteval redteam --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --attacks PromptInjection,Jailbreak --format sarif

List available metrics and attacks

agenteval list
agenteval list --type metrics
agenteval list --type attacks

Authentication

AgentEval supports two endpoint modes: Azure OpenAI (--azure) and OpenAI-compatible (--endpoint).

Azure OpenAI (`--azure`)

The --azure flag uses AzureOpenAIClient. Both --endpoint and --deployment-name are required:

Setting	Flag	Env var fallback
Endpoint	`--endpoint` (required)	—
Deployment	`--deployment-name` (required)	—
API Key	`--api-key`	`AZURE_OPENAI_API_KEY`

# Explicit key
agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml --api-key sk-...

# Key from env var
export AZURE_OPENAI_API_KEY=sk-...
agenteval eval --azure --endpoint https://myresource.openai.azure.com/ --deployment-name gpt-4o --dataset agenteval.yaml

Note: --deployment-name is the name you gave your model deployment in Azure AI Foundry, not the underlying model name.

OpenAI-compatible (`--endpoint`)

For OpenAI, Ollama, Groq, vLLM, LM Studio, Together.ai, or any OpenAI-compatible API:

# OpenAI (set OPENAI_API_KEY or use --api-key)
agenteval eval --endpoint https://api.openai.com/v1 --model gpt-4o --dataset agenteval.yaml --api-key sk-...

# Local Ollama (no key needed)
agenteval eval --endpoint http://localhost:11434/v1 --model llama3 --dataset agenteval.yaml

Commands

Command	Description
`eval`	Run evaluations against an AI agent endpoint
`init`	Scaffold a sample test dataset file
`list`	List available metrics and attack types
`redteam`	Run red team security scans

Requirements

.NET 9.0 or 10.0
An AI agent endpoint (Azure OpenAI, OpenAI, Ollama, or any OpenAI-compatible API)
Built on Microsoft.Extensions.AI (MAF 1.0.0-rc3)

Documentation

AgentEval Documentation

Contributing

Contributions are welcome! Please open an issue or pull request.

For discussions and questions, visit the AgentEval Discussions on the main repository.

License

MIT License. See LICENSE for details.

Product	Compatible and additional computed target framework versions.
.NET	net9.0 is compatible. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 is compatible. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed.

Product

.NET

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

This package has no dependencies.

Version	Downloads	Last Updated
0.2.1-alpha	61	3/5/2026
0.2.0-alpha	45	3/5/2026
0.1.1-alpha	48	3/3/2026
0.1.0-alpha	43	3/1/2026