Ivilson.AI.VllmChatClient
1.5.0
dotnet add package Ivilson.AI.VllmChatClient --version 1.5.0
NuGet\Install-Package Ivilson.AI.VllmChatClient -Version 1.5.0
<PackageReference Include="Ivilson.AI.VllmChatClient" Version="1.5.0" />
<PackageVersion Include="Ivilson.AI.VllmChatClient" Version="1.5.0" />
<PackageReference Include="Ivilson.AI.VllmChatClient" />
paket add Ivilson.AI.VllmChatClient --version 1.5.0
#r "nuget: Ivilson.AI.VllmChatClient, 1.5.0"
#:package Ivilson.AI.VllmChatClient@1.5.0
#addin nuget:?package=Ivilson.AI.VllmChatClient&version=1.5.0
#tool nuget:?package=Ivilson.AI.VllmChatClient&version=1.5.0
vllmchatclient
C# vLLM Chat Client
A comprehensive .NET 8 chat client library that supports various LLM models including GPT-OSS-120B, Qwen3, QwQ-32B, Gemma3, and DeepSeek-R1 with advanced reasoning capabilities.
π Features
- β Multi-model Support: Qwen3, QwQ, Gemma3, DeepSeek-R1, GLM-4, GPT-OSS-120B/20B
- β Reasoning Chain Support: Built-in thinking/reasoning capabilities for supported models
- β Stream Function Calls: Real-time function calling with streaming responses
- β Multiple Deployment Options: Local vLLM deployment and cloud API support
- β Performance Optimized: Efficient streaming and memory management
- β .NET 8 Ready: Full compatibility with the latest .NET platform
π¦ Project Repository
GitHub: https://github.com/iwaitu/vllmchatclient
π₯ Latest Updates
π New GPT-OSS-120B Support
- VllmGptOssChatClient - Support for OpenAI's GPT-OSS-120B model with full reasoning capabilities
- Advanced reasoning chain processing with
ReasoningChatResponseUpdate
- Compatible with OpenRouter and other GPT-OSS providers
- Enhanced debugging and performance optimizations
π GLM-4 Support
- VllmGlmZ1ChatClient - Support for GLM-4 models with reasoning capabilities
- VllmGlm4ChatClient - Standard GLM-4 chat functionality
π Enhanced Qwen 2507 Models
- VllmQwen2507ChatClient - For qwen3-235b-a22b-instruct-2507 (standard)
- VllmQwen2507ReasoningChatClient - For qwen3-235b-a22b-thinking-2507 (with reasoning)
ποΈ Supported Clients
Client | Deployment | Model Support | Reasoning | Function Calls |
---|---|---|---|---|
VllmGptOssChatClient |
OpenRouter/Cloud | GPT-OSS-120B/20B | β Full | β Stream |
VllmQwen3ChatClient |
Local vLLM | Qwen3-32B/235B | β Toggle | β Stream |
VllmQwqChatClient |
Local vLLM | QwQ-32B | β Full | β Stream |
VllmGemmaChatClient |
Local vLLM | Gemma3-27B | β | β Stream |
VllmDeepseekR1ChatClient |
Cloud API | DeepSeek-R1 | β Full | β Stream |
VllmGlmZ1ChatClient |
Local vLLM | GLM-4 | β Full | β Stream |
VllmGlm4ChatClient |
Local vLLM | GLM-4 | β | β Stream |
VllmQwen2507ChatClient |
Cloud API | Qwen3-235B-2507 | β | β Stream |
VllmQwen2507ReasoningChatClient |
Cloud API | Qwen3-235B-2507 | β Full | β Stream |
π³ Docker Deployment Examples
Qwen3/QwQ vLLM Deployment:
docker run -it --gpus all -p 8000:8000 \
-v /models/Qwen3-32B-FP8:/models/Qwen3-32B-FP8 \
--restart always \
-e VLLM_USE_V1=1 \
vllm/llm-openai:v0.8.5 \
--model /models/Qwen3-32B-FP8 \
--enable-auto-tool-choice \
--tool-call-parser llama3_json \
--trust-remote-code \
--max-model-len 131072 \
--tensor-parallel-size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name "qwen3"
Gemma3 vLLM Deployment:
docker run -it --gpus all -p 8000:8000 \
-v /models/gemma-3-27b-it-FP8-Dynamic:/models/gemma-3-27b-it-FP8-Dynamic \
-v /home/lc/work/gemma3.jinja:/home/lc/work/gemma3.jinja \
-e TZ=Asia/Shanghai \
-e VLLM_USE_V1=1 \
--restart always \
vllm/llm-openai:v0.8.2 \
--model /models/gemma-3-27b-it-FP8-Dynamic \
--enable-auto-tool-choice \
--tool-call-parser pythonic \
--chat-template /home/lc/work/gemma3.jinja \
--trust-remote-code \
--max-model-len 128000 \
--tensor-parallel-size 2 \
--gpu_memory_utilization 0.8 \
--served-model-name "gemma3"
π» Usage Examples
π GPT-OSS-120B with Reasoning (OpenRouter)
using Microsoft.Extensions.AI;
using Microsoft.Extensions.AI.VllmChatClient.GptOss;
[Description("Gets weather information")]
static string GetWeather(string city) => $"Weather in {city}: Sunny, 25Β°C";
// Initialize GPT-OSS client
IChatClient gptOssClient = new VllmGptOssChatClient(
"https://openrouter.ai/api/v1",
"your-api-token",
"openai/gpt-oss-120b");
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "You are a helpful assistant with reasoning capabilities."),
new ChatMessage(ChatRole.User, "What's the weather like in Tokyo? Please think through this step by step.")
};
var chatOptions = new ChatOptions
{
Temperature = 0.7f,
ReasoningLevel = GptOssReasoningLevel.Medium, // Set reasoning level,controls depth of reasoning
Tools = [AIFunctionFactory.Create(GetWeather)]
};
// Stream response with reasoning
string reasoning = string.Empty;
string answer = string.Empty;
await foreach (var update in gptOssClient.GetStreamingResponseAsync(messages, chatOptions))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (reasoningUpdate.Thinking)
{
// Capture the model's reasoning process
reasoning += reasoningUpdate.Reasoning;
Console.WriteLine($"π§ Thinking: {reasoningUpdate.Reasoning}");
}
else
{
// Capture the final answer
answer += reasoningUpdate.Text;
Console.WriteLine($"π¬ Response: {reasoningUpdate.Text}");
}
}
}
Console.WriteLine($"\nπ Full Reasoning: {reasoning}");
Console.WriteLine($"β
Final Answer: {answer}");
Qwen3 with Reasoning Toggle
using Microsoft.Extensions.AI;
[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.1 ? "It's sunny" : "It's raining";
IChatClient vllmclient = new VllmQwen3ChatClient("http://localhost:8000/{0}/{1}", null, "qwen3");
IChatClient client = new ChatClientBuilder(vllmclient)
.UseFunctionInvocation()
.Build();
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "δ½ ζ―δΈδΈͺζΊθ½ε©ζοΌεεε«θ²θ²"),
new ChatMessage(ChatRole.User, "δ»ε€©ε€©ζ°ε¦δ½οΌ")
};
Qwen3ChatOptions chatOptions = new()
{
Tools = [AIFunctionFactory.Create(GetWeather)],
NoThinking = true // Toggle reasoning on/off
};
string res = string.Empty;
await foreach (var update in client.GetStreamingResponseAsync(messages, chatOptions))
{
res += update.Text;
}
QwQ with Full Reasoning Support
using Microsoft.Extensions.AI;
[Description("Gets the weather")]
static string GetWeather() => Random.Shared.NextDouble() > 0.5 ? "It's sunny" : "It's raining";
IChatClient vllmclient = new VllmQwqChatClient("http://localhost:8000/{0}/{1}", null, "qwq");
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "δ½ ζ―δΈδΈͺζΊθ½ε©ζοΌεεε«θ²θ²"),
new ChatMessage(ChatRole.User, "δ»ε€©ε€©ζ°ε¦δ½οΌ")
};
ChatOptions chatOptions = new()
{
Tools = [AIFunctionFactory.Create(GetWeather)]
};
// Stream with reasoning separation
private async Task<(string answer, string reasoning)> StreamChatResponseAsync(
List<ChatMessage> messages, ChatOptions chatOptions)
{
string answer = string.Empty;
string reasoning = string.Empty;
await foreach (var update in vllmclient.GetStreamingResponseAsync(messages, chatOptions))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (!reasoningUpdate.Thinking)
{
answer += reasoningUpdate.Text;
}
else
{
reasoning += reasoningUpdate.Text;
}
}
else
{
answer += update.Text;
}
}
return (answer, reasoning);
}
var (answer, reasoning) = await StreamChatResponseAsync(messages, chatOptions);
DeepSeek-R1 with Reasoning
using Microsoft.Extensions.AI;
IChatClient client = new VllmDeepseekR1ChatClient(
"https://dashscope.aliyuncs.com/compatible-mode/v1/{1}",
"your-api-key",
"deepseek-r1");
var messages = new List<ChatMessage>
{
new ChatMessage(ChatRole.System, "δ½ ζ―δΈδΈͺζΊθ½ε©ζοΌεεε«θ²θ²"),
new ChatMessage(ChatRole.User, "δ½ ζ―θ°οΌ")
};
string res = string.Empty;
string think = string.Empty;
await foreach (ReasoningChatResponseUpdate update in client.GetStreamingResponseAsync(messages))
{
if (update.Thinking)
{
think += update.Text;
}
else
{
res += update.Text;
}
}
π§ Advanced Features
Reasoning Chain Processing
All reasoning-capable clients support the ReasoningChatResponseUpdate
interface:
await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
if (update is ReasoningChatResponseUpdate reasoningUpdate)
{
if (reasoningUpdate.Thinking)
{
// Process thinking/reasoning content
Console.WriteLine($"π€ Reasoning: {reasoningUpdate.Reasoning}");
}
else
{
// Process final response
Console.WriteLine($"π¬ Answer: {reasoningUpdate.Text}");
}
}
}
Function Calling with Streaming
All clients support real-time function calling:
[Description("Search for location information")]
static string Search([Description("Search query")] string query)
{
return "Location found: Beijing, China";
}
ChatOptions options = new()
{
Tools = [AIFunctionFactory.Create(Search)],
Temperature = 0.7f
};
await foreach (var update in client.GetStreamingResponseAsync(messages, options))
{
// Handle function calls and responses in real-time
foreach (var content in update.Contents)
{
if (content is FunctionCallContent functionCall)
{
Console.WriteLine($"π§ Calling: {functionCall.Name}");
}
}
}
π Performance & Optimizations
- Stream Processing: Efficient real-time response handling
- Memory Management: Optimized for long conversations
- Error Handling: Robust error recovery and debugging support
- JSON Parsing: High-performance serialization with System.Text.Json
- Connection Pooling: Shared HttpClient for optimal resource usage
π Requirements
- .NET 8.0 or higher
- Microsoft.Extensions.AI framework
- Newtonsoft.Json for JSON processing
- System.Text.Json for high-performance scenarios
π€ Contributing
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
π License
This project is licensed under the MIT License. See the LICENSE file for details.
Product | Versions Compatible and additional computed target framework versions. |
---|---|
.NET | net8.0 is compatible. net8.0-android was computed. net8.0-browser was computed. net8.0-ios was computed. net8.0-maccatalyst was computed. net8.0-macos was computed. net8.0-tvos was computed. net8.0-windows was computed. net9.0 was computed. net9.0-android was computed. net9.0-browser was computed. net9.0-ios was computed. net9.0-maccatalyst was computed. net9.0-macos was computed. net9.0-tvos was computed. net9.0-windows was computed. net10.0 was computed. net10.0-android was computed. net10.0-browser was computed. net10.0-ios was computed. net10.0-maccatalyst was computed. net10.0-macos was computed. net10.0-tvos was computed. net10.0-windows was computed. |
-
net8.0
- Microsoft.Extensions.AI (>= 9.6.0)
- Newtonsoft.Json (>= 13.0.3)
NuGet packages
This package is not used by any NuGet packages.
GitHub repositories
This package is not used by any popular GitHub repositories.
Version | Downloads | Last Updated |
---|---|---|
1.5.0 | 211 | 8/7/2025 |
1.4.8 | 199 | 8/7/2025 |
1.4.6 | 209 | 8/7/2025 |
1.4.5 | 120 | 7/31/2025 |
1.4.2 | 194 | 6/23/2025 |
1.4.0 | 145 | 6/23/2025 |
1.3.8 | 145 | 6/21/2025 |
1.3.6 | 79 | 6/21/2025 |
1.3.5 | 96 | 6/20/2025 |
1.3.2 | 72 | 6/20/2025 |
1.3.1 | 74 | 6/20/2025 |
1.3.0 | 350 | 6/3/2025 |
1.2.8 | 133 | 6/3/2025 |
1.2.7 | 89 | 5/31/2025 |
1.2.6 | 64 | 5/31/2025 |
1.2.5 | 58 | 5/31/2025 |
1.2.3 | 67 | 5/30/2025 |
1.2.2 | 123 | 5/24/2025 |
1.2.0 | 300 | 5/13/2025 |
1.1.8 | 233 | 5/13/2025 |
1.1.6 | 180 | 5/12/2025 |
1.1.5 | 109 | 5/2/2025 |
1.1.4 | 125 | 4/29/2025 |
1.1.3 | 113 | 4/29/2025 |
1.1.2 | 118 | 4/29/2025 |
1.1.1 | 119 | 4/29/2025 |
1.1.0 | 152 | 4/23/2025 |
1.0.8 | 150 | 4/23/2025 |
1.0.6 | 136 | 4/18/2025 |