Ivilson.AI.VllmChatClient 1.5.0

dotnet add package Ivilson.AI.VllmChatClient --version 1.5.0
                    
NuGet\Install-Package Ivilson.AI.VllmChatClient -Version 1.5.0
                    
This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.
<PackageReference Include="Ivilson.AI.VllmChatClient" Version="1.5.0" />
                    
For projects that support PackageReference, copy this XML node into the project file to reference the package.
<PackageVersion Include="Ivilson.AI.VllmChatClient" Version="1.5.0" />
                    
Directory.Packages.props
<PackageReference Include="Ivilson.AI.VllmChatClient" />
                    
Project file
For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.
paket add Ivilson.AI.VllmChatClient --version 1.5.0
                    
#r "nuget: Ivilson.AI.VllmChatClient, 1.5.0"
                    
#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.
#:package Ivilson.AI.VllmChatClient@1.5.0
                    
#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.
#addin nuget:?package=Ivilson.AI.VllmChatClient&version=1.5.0
                    
Install as a Cake Addin
#tool nuget:?package=Ivilson.AI.VllmChatClient&version=1.5.0
                    
Install as a Cake Tool

vllmchatclient

GitHub stars GitHub forks GitHub issues GitHub license Last commit .NET

C# vLLM Chat Client

A comprehensive .NET 8 chat client library that supports various LLM models including GPT-OSS-120B, Qwen3, QwQ-32B, Gemma3, and DeepSeek-R1 with advanced reasoning capabilities.

πŸš€ Features

  • βœ… Multi-model Support: Qwen3, QwQ, Gemma3, DeepSeek-R1, GLM-4, GPT-OSS-120B/20B
  • βœ… Reasoning Chain Support: Built-in thinking/reasoning capabilities for supported models
  • βœ… Stream Function Calls: Real-time function calling with streaming responses
  • βœ… Multiple Deployment Options: Local vLLM deployment and cloud API support
  • βœ… Performance Optimized: Efficient streaming and memory management
  • βœ… .NET 8 Ready: Full compatibility with the latest .NET platform

πŸ“¦ Project Repository

GitHub: https://github.com/iwaitu/vllmchatclient


πŸ”₯ Latest Updates

πŸ†• New GPT-OSS-120B Support

  • VllmGptOssChatClient - Support for OpenAI's GPT-OSS-120B model with full reasoning capabilities
  • Advanced reasoning chain processing with ReasoningChatResponseUpdate
  • Compatible with OpenRouter and other GPT-OSS providers
  • Enhanced debugging and performance optimizations

πŸ†• GLM-4 Support

  • VllmGlmZ1ChatClient - Support for GLM-4 models with reasoning capabilities
  • VllmGlm4ChatClient - Standard GLM-4 chat functionality

πŸ†• Enhanced Qwen 2507 Models

  • VllmQwen2507ChatClient - For qwen3-235b-a22b-instruct-2507 (standard)
  • VllmQwen2507ReasoningChatClient - For qwen3-235b-a22b-thinking-2507 (with reasoning)

πŸ—οΈ Supported Clients

Client Deployment Model Support Reasoning Function Calls
VllmGptOssChatClient OpenRouter/Cloud GPT-OSS-120B/20B βœ… Full βœ… Stream
VllmQwen3ChatClient Local vLLM Qwen3-32B/235B βœ… Toggle βœ… Stream
VllmQwqChatClient Local vLLM QwQ-32B βœ… Full βœ… Stream
VllmGemmaChatClient Local vLLM Gemma3-27B ❌ βœ… Stream
VllmDeepseekR1ChatClient Cloud API DeepSeek-R1 βœ… Full βœ… Stream
VllmGlmZ1ChatClient Local vLLM GLM-4 βœ… Full βœ… Stream
VllmGlm4ChatClient Local vLLM GLM-4 ❌ βœ… Stream
VllmQwen2507ChatClient Cloud API Qwen3-235B-2507 ❌ βœ… Stream
VllmQwen2507ReasoningChatClient Cloud API Qwen3-235B-2507 βœ… Full βœ… Stream

🐳 Docker Deployment Examples

Qwen3/QwQ vLLM Deployment:

docker run -it --gpus all -p 8000:8000 \
  -v /models/Qwen3-32B-FP8:/models/Qwen3-32B-FP8 \
  --restart always \
  -e VLLM_USE_V1=1 \
  vllm/llm-openai:v0.8.5 \
  --model /models/Qwen3-32B-FP8 \
  --enable-auto-tool-choice \
  --tool-call-parser llama3_json \
  --trust-remote-code \
  --max-model-len 131072 \
  --tensor-parallel-size 2 \
  --gpu_memory_utilization 0.8 \
  --served-model-name "qwen3"

Gemma3 vLLM Deployment:

docker run -it --gpus all -p 8000:8000 \
  -v /models/gemma-3-27b-it-FP8-Dynamic:/models/gemma-3-27b-it-FP8-Dynamic \
  -v /home/lc/work/gemma3.jinja:/home/lc/work/gemma3.jinja \
  -e TZ=Asia/Shanghai \
  -e VLLM_USE_V1=1 \
  --restart always \
  vllm/llm-openai:v0.8.2 \
  --model /models/gemma-3-27b-it-FP8-Dynamic \
  --enable-auto-tool-choice \
  --tool-call-parser pythonic \
  --chat-template /home/lc/work/gemma3.jinja \
  --trust-remote-code \
  --max-model-len 128000 \
  --tensor-parallel-size 2 \
  --gpu_memory_utilization 0.8 \
  --served-model-name "gemma3"

πŸ’» Usage Examples

πŸ†• GPT-OSS-120B with Reasoning (OpenRouter)

using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.VllmChatClient.GptOss;

[Description("Gets weather information")]
static string GetWeather(string city) => $"Weather in {city}: Sunny, 25Β°C";

// Initialize GPT-OSS client
IChatClient gptOssClient = new VllmGptOssChatClient(
    "https://openrouter.ai/api/v1", 
    "your-api-token", 
    "openai/gpt-oss-120b");

var messages = new List<ChatMessage>
{
    new ChatMessage(ChatRole.System, "You are a helpful assistant with reasoning capabilities."),
    new ChatMessage(ChatRole.User, "What's the weather like in Tokyo? Please think through this step by step.")
};

var chatOptions = new ChatOptions
{
    Temperature = 0.7f,
    ReasoningLevel = GptOssReasoningLevel.Medium,    // Set reasoning level,controls depth of reasoning
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

// Stream response with reasoning
string reasoning = string.Empty;
string answer = string.Empty;

await foreach (var update in gptOssClient.GetStreamingResponseAsync(messages, chatOptions))
{
    if (update is ReasoningChatResponseUpdate reasoningUpdate)
    {
        if (reasoningUpdate.Thinking)
        {
            // Capture the model's reasoning process
            reasoning += reasoningUpdate.Reasoning;
            Console.WriteLine($"🧠 Thinking: {reasoningUpdate.Reasoning}");
        }
        else
        {
            // Capture the final answer
            answer += reasoningUpdate.Text;
            Console.WriteLine($"πŸ’¬ Response: {reasoningUpdate.Text}");
        }
    }
}

Console.WriteLine($"\nπŸ“ Full Reasoning: {reasoning}");
Console.WriteLine($"βœ… Final Answer: {answer}");

Qwen3 with Reasoning Toggle

using Microsoft.Extensions.AI;

[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.1 ? "It's sunny" : "It's raining";

IChatClient vllmclient = new VllmQwen3ChatClient("http://localhost:8000/{0}/{1}", null, "qwen3");
IChatClient client = new ChatClientBuilder(vllmclient)
    .UseFunctionInvocation()
    .Build();

var messages = new List<ChatMessage>
{
    new ChatMessage(ChatRole.System, "δ½ ζ˜―δΈ€δΈͺζ™Ίθƒ½εŠ©ζ‰‹οΌŒεε­—ε«θ²θ²"),
    new ChatMessage(ChatRole.User, "δ»Šε€©ε€©ζ°”ε¦‚δ½•οΌŸ")
};

Qwen3ChatOptions chatOptions = new()
{
    Tools = [AIFunctionFactory.Create(GetWeather)],
    NoThinking = true  // Toggle reasoning on/off
};

string res = string.Empty;
await foreach (var update in client.GetStreamingResponseAsync(messages, chatOptions))
{
    res += update.Text;
}

QwQ with Full Reasoning Support

using Microsoft.Extensions.AI;

[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining";

IChatClient vllmclient = new VllmQwqChatClient("http://localhost:8000/{0}/{1}", null, "qwq");

var messages = new List<ChatMessage>
{
    new ChatMessage(ChatRole.System, "δ½ ζ˜―δΈ€δΈͺζ™Ίθƒ½εŠ©ζ‰‹οΌŒεε­—ε«θ²θ²"),
    new ChatMessage(ChatRole.User, "δ»Šε€©ε€©ζ°”ε¦‚δ½•οΌŸ")
};

ChatOptions chatOptions = new()
{
    Tools = [AIFunctionFactory.Create(GetWeather)]
};

// Stream with reasoning separation
private async Task<(string answer, string reasoning)> StreamChatResponseAsync(
    List<ChatMessage> messages, ChatOptions chatOptions)
{
    string answer = string.Empty;
    string reasoning = string.Empty;
    
    await foreach (var update in vllmclient.GetStreamingResponseAsync(messages, chatOptions))
    {
        if (update is ReasoningChatResponseUpdate reasoningUpdate)
        {
            if (!reasoningUpdate.Thinking)
            {
                answer += reasoningUpdate.Text;
            }
            else
            {
                reasoning += reasoningUpdate.Text;
            }
        }
        else
        {
            answer += update.Text;
        }
    }
    return (answer, reasoning);
}

var (answer, reasoning) = await StreamChatResponseAsync(messages, chatOptions);

DeepSeek-R1 with Reasoning

using Microsoft.Extensions.AI;

IChatClient client = new VllmDeepseekR1ChatClient(
    "https://dashscope.aliyuncs.com/compatible-mode/v1/{1}", 
    "your-api-key", 
    "deepseek-r1");

var messages = new List<ChatMessage>
{
    new ChatMessage(ChatRole.System, "δ½ ζ˜―δΈ€δΈͺζ™Ίθƒ½εŠ©ζ‰‹οΌŒεε­—ε«θ²θ²"),
    new ChatMessage(ChatRole.User, "你是谁?")
};

string res = string.Empty;
string think = string.Empty;

await foreach (ReasoningChatResponseUpdate update in client.GetStreamingResponseAsync(messages))
{
    if (update.Thinking)
    {
        think += update.Text;
    }
    else
    {
        res += update.Text;
    }
}

πŸ”§ Advanced Features

Reasoning Chain Processing

All reasoning-capable clients support the ReasoningChatResponseUpdate interface:

await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
    if (update is ReasoningChatResponseUpdate reasoningUpdate)
    {
        if (reasoningUpdate.Thinking)
        {
            // Process thinking/reasoning content
            Console.WriteLine($"πŸ€” Reasoning: {reasoningUpdate.Reasoning}");
        }
        else
        {
            // Process final response
            Console.WriteLine($"πŸ’¬ Answer: {reasoningUpdate.Text}");
        }
    }
}

Function Calling with Streaming

All clients support real-time function calling:

[Description("Search for location information")]
static string Search([Description("Search query")] string query)
{
    return "Location found: Beijing, China";
}

ChatOptions options = new()
{
    Tools = [AIFunctionFactory.Create(Search)],
    Temperature = 0.7f
};

await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
    // Handle function calls and responses in real-time
    foreach (var content in update.Contents)
    {
        if (content is FunctionCallContent functionCall)
        {
            Console.WriteLine($"πŸ”§ Calling: {functionCall.Name}");
        }
    }
}

πŸ† Performance & Optimizations

  • Stream Processing: Efficient real-time response handling
  • Memory Management: Optimized for long conversations
  • Error Handling: Robust error recovery and debugging support
  • JSON Parsing: High-performance serialization with System.Text.Json
  • Connection Pooling: Shared HttpClient for optimal resource usage

πŸ“‹ Requirements

  • .NET 8.0 or higher
  • Microsoft.Extensions.AI framework
  • Newtonsoft.Json for JSON processing
  • System.Text.Json for high-performance scenarios

🀝 Contributing

Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.


πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

Product Compatible and additional computed target framework versions.
.NET net8.0 is compatible.  net8.0-android was computed.  net8.0-browser was computed.  net8.0-ios was computed.  net8.0-maccatalyst was computed.  net8.0-macos was computed.  net8.0-tvos was computed.  net8.0-windows was computed.  net9.0 was computed.  net9.0-android was computed.  net9.0-browser was computed.  net9.0-ios was computed.  net9.0-maccatalyst was computed.  net9.0-macos was computed.  net9.0-tvos was computed.  net9.0-windows was computed.  net10.0 was computed.  net10.0-android was computed.  net10.0-browser was computed.  net10.0-ios was computed.  net10.0-maccatalyst was computed.  net10.0-macos was computed.  net10.0-tvos was computed.  net10.0-windows was computed. 
Compatible target framework(s)
Included target framework(s) (in package)
Learn more about Target Frameworks and .NET Standard.

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version Downloads Last Updated
1.5.0 211 8/7/2025
1.4.8 199 8/7/2025
1.4.6 209 8/7/2025
1.4.5 120 7/31/2025
1.4.2 194 6/23/2025
1.4.0 145 6/23/2025
1.3.8 145 6/21/2025
1.3.6 79 6/21/2025
1.3.5 96 6/20/2025
1.3.2 72 6/20/2025
1.3.1 74 6/20/2025
1.3.0 350 6/3/2025
1.2.8 133 6/3/2025
1.2.7 89 5/31/2025
1.2.6 64 5/31/2025
1.2.5 58 5/31/2025
1.2.3 67 5/30/2025
1.2.2 123 5/24/2025
1.2.0 300 5/13/2025
1.1.8 233 5/13/2025
1.1.6 180 5/12/2025
1.1.5 109 5/2/2025
1.1.4 125 4/29/2025
1.1.3 113 4/29/2025
1.1.2 118 4/29/2025
1.1.1 119 4/29/2025
1.1.0 152 4/23/2025
1.0.8 150 4/23/2025
1.0.6 136 4/18/2025