Connecting .NET to OpenAI with Microsoft.Extensions.AI

Wednesday, July 2, 2025

I recently integrated AI-powered property descriptions for a Spanish real‑estate portal. Using Microsoft.Extensions.AI and the OpenAI platform, we built a service that rewrites, cleans, and translates property descriptions in a scalable, maintainable way. Here's how it works — and how you can do something similar.

Models & Clients

Microsoft.Extensions.AI offers a high‑level IChatClient abstraction over the underlying OpenAI SDKs. You first specify a model — for example "gpt-4.1-mini" for rewriting, or "gpt-4.1-nano" for translation. Then you wrap it in:

IChatClient client = new OpenAI.Chat.ChatClient(model, aiOptions.Value.OpenAiKey)
    .AsIChatClient();

This gives you a simple, DI‑friendly way to make chat calls.

Chat Messages & Roles

When calling GetResponseAsync, you supply a List<ChatMessage> — typically with two entries:

  • ChatRole.System: sets context and instructions ("You are a professional newspaper article writer...")
  • ChatRole.User: the actual user content or text to process
var messages = new List<ChatMessage> {
    new ChatMessage(ChatRole.System, prompt),
    new ChatMessage(ChatRole.User, inputText)
};

This structure cleanly separates instructions from content.

Temperature & Top‑P

In ChatOptions, you can tweak two key parameters that control the randomness and diversity of the model's output:

  • Temperature (0.0 – 1.0): Controls the randomness of the response.
    • Low values (e.g. 0.1 – 0.3) make the output more focused and deterministic—ideal for rewriting or summarizing.
    • High values (e.g. 0.7 – 0.9) produce more varied, creative output — better suited for brainstorming or open-ended tasks.
    • At 0.0, the model aims for the most likely next word at each step.
  • TopP (0.0 – 1.0): Enables nucleus sampling.
    • The model considers only the most likely tokens whose cumulative probability is less than TopP.
    • Lower TopP values reduce the possibility of rare or unexpected outputs.
    • Useful in conjunction with Temperature — you usually tune one and fix the other.

Together, these control how "creative" or "conservative" the model is in its responses.

In our example we use conservative values:

new ChatOptions {
  Temperature = 0.2f,
  TopP = 0.2f,
  MaxOutputTokens = 2000
}

Low values keep descriptions factual and consistent.

Token Usage

  • MaxOutputTokens caps generated tokens (approx. ¾ of word count).
  • Input tokens count toward your bill—too-long prompts inflate cost.
  • Monitor usage via model metrics or your OpenAI dashboard to optimize.

Minimal Implementation

IChatClient client = new OpenAI.Chat.ChatClient("gpt-4.1-mini", aiKey)
    .AsIChatClient();
 
var response = await client.GetResponseAsync(new[]{
    new ChatMessage(ChatRole.System, 
        @"You are a professional newspaper writer. 
        Clean all HTML and special characters. 
        Apply the correct formal tone..."),
    new ChatMessage(ChatRole.User, rawText)
}, new ChatOptions {
    Temperature = 0.2f,
    TopP = 0.2f,
    MaxOutputTokens = 2000
});
 
string cleanedText = response.Text;

Why It Works

  • Models: choose based on task complexity
  • Clients: simple, DI‑friendly usage
  • Roles: separate instructions from content
  • Tuning: temperature/TopP control creativity
  • Token limits: keep control on output and cost

Prompt Management & Testing

Prompts are currently embedded in C# strings. That's fine for small apps, but for larger systems:

  • Store prompts in files or configuration
  • Use templating to inject variables
  • Even better: prompt versioning or prompt A/B testing

This makes it easier to iterate, measure performance, and avoid regressions.

Model Choice Strategy

OpenAI offers a diverse lineup of models with different strengths, tradeoffs in capability, cost, and latency. Here's a breakdown of the most relevant options for backend integrations:

GPT‑4 Family

  • gpt-4 Multimodal model released in 2023. Strong reasoning and creative capabilities. High cost and latency.
  • gpt-4-turbo A cheaper, faster variant of GPT-4 with a 128k context window. Ideal for production use where GPT-4 quality is needed at lower cost.
  • gpt-4o ("Omni") Released in May 2024. Multimodal (text, vision, audio) with excellent performance and lower latency/cost than previous GPT-4 variants.
  • gpt-4o-mini Launched July 2024. A lightweight version of GPT-4o, more capable than gpt-3.5-turbo while being cheaper.
  • gpt-4.1 Introduced in 2025. Improved reasoning and text generation versus earlier GPT-4 versions.
  • gpt-4.1-mini / gpt-4.1-nano Lightweight versions for constrained use cases—offer faster responses and lower cost.

"o" Reasoning Models

  • o1 / o1-mini Introduced late 2024. Designed for reasoning-heavy tasks, competitive with GPT-4 for math/scientific reasoning.
  • o3 / o3-mini Successors to o1. Further improved reasoning with better latency/performance tradeoffs.
  • o4-mini Released in April 2025. Optimized for compact, low-latency deployments with chain-of-thought capabilities.

GPT‑3.5 Family

  • gpt-3.5-turbo Cheap and fast. Lower reasoning and context handling vs GPT-4. Still useful for many general backend tasks.

Choosing the Right Model

Use CaseRecommended Models
Critical accuracy + contextgpt-4o, gpt-4.1, o3
Balanced cost/performancegpt-4-turbo, gpt-4.1-mini, o1-mini, o4-mini
Fast, budget-friendlygpt-3.5-turbo, o3-mini, gpt-4o-mini
Constrained or deterministic tasksgpt-4.1-nano, gpt-4.1-mini, o4-mini

See OpenAI's model comparison for a deeper dive.