I recently integrated AI-powered property descriptions for a Spanish real‑estate portal. Using Microsoft.Extensions.AI and the OpenAI platform, we built a service that rewrites, cleans, and translates property descriptions in a scalable, maintainable way. Here's how it works — and how you can do something similar.
Models & Clients
Microsoft.Extensions.AI
offers a high‑level IChatClient
abstraction over the underlying OpenAI SDKs. You first specify a model — for example "gpt-4.1-mini" for rewriting, or "gpt-4.1-nano" for translation. Then you wrap it in:
IChatClient client = new OpenAI.Chat.ChatClient(model, aiOptions.Value.OpenAiKey)
.AsIChatClient();
This gives you a simple, DI‑friendly way to make chat calls.
Chat Messages & Roles
When calling GetResponseAsync, you supply a List<ChatMessage>
— typically with two entries:
ChatRole.System
: sets context and instructions ("You are a professional newspaper article writer...")ChatRole.User
: the actual user content or text to process
var messages = new List<ChatMessage> {
new ChatMessage(ChatRole.System, prompt),
new ChatMessage(ChatRole.User, inputText)
};
This structure cleanly separates instructions from content.
Temperature & Top‑P
In ChatOptions
, you can tweak two key parameters that control the randomness and diversity of the model's output:
- Temperature (0.0 – 1.0): Controls the randomness of the response.
- Low values (e.g. 0.1 – 0.3) make the output more focused and deterministic—ideal for rewriting or summarizing.
- High values (e.g. 0.7 – 0.9) produce more varied, creative output — better suited for brainstorming or open-ended tasks.
- At 0.0, the model aims for the most likely next word at each step.
- TopP (0.0 – 1.0): Enables nucleus sampling.
- The model considers only the most likely tokens whose cumulative probability is less than TopP.
- Lower TopP values reduce the possibility of rare or unexpected outputs.
- Useful in conjunction with Temperature — you usually tune one and fix the other.
Together, these control how "creative" or "conservative" the model is in its responses.
In our example we use conservative values:
new ChatOptions {
Temperature = 0.2f,
TopP = 0.2f,
MaxOutputTokens = 2000
}
Low values keep descriptions factual and consistent.
Token Usage
- MaxOutputTokens caps generated tokens (approx. ¾ of word count).
- Input tokens count toward your bill—too-long prompts inflate cost.
- Monitor usage via model metrics or your OpenAI dashboard to optimize.
Minimal Implementation
IChatClient client = new OpenAI.Chat.ChatClient("gpt-4.1-mini", aiKey)
.AsIChatClient();
var response = await client.GetResponseAsync(new[]{
new ChatMessage(ChatRole.System,
@"You are a professional newspaper writer.
Clean all HTML and special characters.
Apply the correct formal tone..."),
new ChatMessage(ChatRole.User, rawText)
}, new ChatOptions {
Temperature = 0.2f,
TopP = 0.2f,
MaxOutputTokens = 2000
});
string cleanedText = response.Text;
Why It Works
- Models: choose based on task complexity
- Clients: simple, DI‑friendly usage
- Roles: separate instructions from content
- Tuning: temperature/TopP control creativity
- Token limits: keep control on output and cost
Prompt Management & Testing
Prompts are currently embedded in C# strings. That's fine for small apps, but for larger systems:
- Store prompts in files or configuration
- Use templating to inject variables
- Even better: prompt versioning or prompt A/B testing
This makes it easier to iterate, measure performance, and avoid regressions.
Model Choice Strategy
OpenAI offers a diverse lineup of models with different strengths, tradeoffs in capability, cost, and latency. Here's a breakdown of the most relevant options for backend integrations:
GPT‑4 Family
- gpt-4 Multimodal model released in 2023. Strong reasoning and creative capabilities. High cost and latency.
- gpt-4-turbo A cheaper, faster variant of GPT-4 with a 128k context window. Ideal for production use where GPT-4 quality is needed at lower cost.
- gpt-4o ("Omni") Released in May 2024. Multimodal (text, vision, audio) with excellent performance and lower latency/cost than previous GPT-4 variants.
- gpt-4o-mini Launched July 2024. A lightweight version of GPT-4o, more capable than gpt-3.5-turbo while being cheaper.
- gpt-4.1 Introduced in 2025. Improved reasoning and text generation versus earlier GPT-4 versions.
- gpt-4.1-mini / gpt-4.1-nano Lightweight versions for constrained use cases—offer faster responses and lower cost.
"o" Reasoning Models
- o1 / o1-mini Introduced late 2024. Designed for reasoning-heavy tasks, competitive with GPT-4 for math/scientific reasoning.
- o3 / o3-mini Successors to o1. Further improved reasoning with better latency/performance tradeoffs.
- o4-mini Released in April 2025. Optimized for compact, low-latency deployments with chain-of-thought capabilities.
GPT‑3.5 Family
- gpt-3.5-turbo Cheap and fast. Lower reasoning and context handling vs GPT-4. Still useful for many general backend tasks.
Choosing the Right Model
Use Case | Recommended Models |
---|---|
Critical accuracy + context | gpt-4o, gpt-4.1, o3 |
Balanced cost/performance | gpt-4-turbo, gpt-4.1-mini, o1-mini, o4-mini |
Fast, budget-friendly | gpt-3.5-turbo, o3-mini, gpt-4o-mini |
Constrained or deterministic tasks | gpt-4.1-nano, gpt-4.1-mini, o4-mini |
See OpenAI's model comparison for a deeper dive.