SpecialisedUser storyConsultative playbook

We need a grounded LLM that answers from our product docs — not the open internet

A SaaS company's support team wants a grounded assistant that answers from product documentation, runbooks, and ticket history. The CTO insists on responsible-AI guardrails — content filtering, evaluation harnesses, and lineage — before anything goes live to customers.

Trigger: Support cost rising; product team has a clear use case.
Good outcome: Foundry deployment with AI Search RAG, evaluation pipeline, content filtering, responsible-AI gates.

Diagnostic discovery

Signals this story fits

Observable cues that confirm the conversation belongs here.

·Customer-support volume rising; product complexity growing
·Product team has clear documentation surface (Confluence, SharePoint, internal wiki)
·CISO insists on responsible-AI guardrails before any customer-facing launch
·Existing Azure footprint and Azure OpenAI access already in place
·Internal LLM experiments running without lineage or governance

Questions to ask

Open-ended, SPIN-style — each one has a reason it matters.

1.What are your support team's top question categories, and which are documented?
WhyDocumented topics are the RAG sweet-spot. Undocumented ones are different (training data) projects.
2.Where do product docs live — SharePoint, Confluence, Notion, custom?
WhyDetermines source-content indexing strategy and AI Search configuration.
3.What's your RAG pipeline today, if any — notebook prototypes, ad-hoc tooling, nothing?
WhySurfaces maturity. Most customers are at "notebook prototype" stage.
4.How do you evaluate model output quality?
WhyWithout evaluation, scale is impossible. Surfaces the responsible-AI gap.
5.What content filtering is configured on Azure OpenAI today?
WhyDefault filtering is rarely enough for customer-facing scenarios. Tests CISO readiness.
6.Has Legal reviewed the proposed launch surface (internal beta vs customer-facing)?
WhyCustomer-facing AI carries materially different obligations.

Baseline → target architecture

TOGAF-style gap framing — what we typically see today, and what the proposed end state looks like. The gap between them is the engagement.

Baseline architecture

Support team handling rising volume manually. Product documentation fragmented. Ad-hoc Azure OpenAI experiments running on default settings. No evaluation harness. No content filtering beyond defaults. No lineage of what content the model sees.

Typical concerns

·Support cost rising faster than headcount budget
·Documentation surface fragmented
·AI experiments without responsible-AI guardrails
·No evaluation framework for output quality
·Customer-facing risk if launched without governance

Capability gaps

·RAG pipeline with grounded retrieval
·Evaluation harness with quality gates
·Content filtering tuned to scenario
·Identity-bound endpoint access
·Lineage of grounding content

Target architecture

Azure AI Foundry hub with AI Search indexing product documentation. RAG pipeline grounded on indexed content. Evaluation harness with quality gates (relevance, faithfulness, hallucination rate). Content filtering tuned for customer-facing tier. Entra ID-bound access for internal users and (later) customer-facing endpoints. Optional Databricks for data engineering on ticket history.

Key capabilities

AI Search-backed RAG pipeline
Evaluation harness and quality gates
Content filtering tuned to scenario
Identity-bound endpoint access
Responsible AI build-time discipline

Enabling SKUs

Resolved in the ‘Recommended cards’ section below.

Architecture decisions

Each decision is offered as explicit options with trade-offs — Hohpe's “selling options” principle. A safe default is noted where one exists.

Decision 1.Retrieval — Azure AI Search vs custom vector store
Azure AI Search
When it fitsAzure-native estate; native Foundry integration; mixed dense + keyword retrieval needed.
Trade-offsAI Search consumption cost at scale.
Custom vector store (e.g. pgvector, Pinecone)
When it fitsExisting vector-store investment; very specific retrieval patterns.
Trade-offsMore integration work; less native Foundry tooling.
Default recommendationAzure AI Search for the first RAG workload; revisit only if scale demands it.
Decision 2.Model substrate — Foundry-hosted vs Mosaic AI on Databricks
Foundry-hosted
When it fitsAzure-native; need broad model catalogue; pro-code AI engineering.
Trade-offsTwo governance planes if Databricks also in use.
Mosaic AI on Databricks
When it fitsData and ML workloads already on Databricks; lakehouse-native lineage needed.
Trade-offsSmaller GenAI ecosystem than Foundry.
Default recommendationFoundry where the data lives in Azure; Mosaic AI where the data is already in Databricks.
Decision 3.Content filtering strictness — default vs custom
Default
When it fitsInternal-only beta; risk surface low.
Trade-offsGeneric filtering may block legitimate domain queries.
Custom (scenario-tuned)
When it fitsCustomer-facing launch; specific compliance requirements.
Trade-offsMore tuning effort; needs iterative refinement.
Default recommendationDefault for the internal beta; transition to custom before customer-facing launch.

Low-risk trial — proof of value

45-day grounded-LLM prototype for support agents

6 weeks

Foundry hub provisioned. Product documentation indexed via AI Search. RAG endpoint live with content filtering. Evaluation harness covering relevance, faithfulness, hallucination rate. Internal beta to 20 support agents. Telemetry captured for grounding hit-rate, response time, escalation rate.

Success criteria

Response relevance score above 80% on evaluation harness
Hallucination rate below 5% on evaluation harness
Support-agent NPS on the tool above baseline
Zero content-filter false positives causing escalation gaps

InvestmentFoundry token consumption + AI Search capacity. Estimated ~€2–4k/month for the trial scope. No customer-facing launch decisions made during trial.

Proof metrics

·Relevance score, faithfulness, hallucination rate tracked daily
·Internal-agent NPS on the tool above baseline
·Reduction in time-to-answer for indexed topics
·Evaluation harness covering the top ten question categories

Recommended cards

The SKUs and capabilities most likely to be part of the solution, with the editorial rationale for each in the context of this story. Add the ones that fit your situation.

SKUMicrosoftAzure

Azure AI Foundry

Microsoft's unified AI development platform — Azure OpenAI Service, model catalog (100+ open and proprietary models), agent service, prompt flow, evaluation, AI Search, Azure ML capabilities. The Microsoft strategic surface for AI engineering and generative AI applications.

In: Azure consumption

Data & Analytics

Why for this story

The pro-code AI development surface. AI Search for RAG, agent service for stateful agents, evaluation harness for the quality gates the CISO insists on.

SKUMicrosoftSaaS

Microsoft Entra ID P1

The identity foundation for Microsoft 365 and integrated SaaS. Adds Conditional Access, group-based access management, hybrid identity, and MFA on top of the free Entra ID tier. The MFA + Conditional Access pair is the practical floor for Zero Trust.

In: M365 E3, M365 E5…

Cloud Foundation

Why for this story

Identity scope for both internal users and (when launched) customer-facing endpoints. Conditional Access enforces who can call what.

SKUDatabricksMulti-cloud

Azure Databricks

The lakehouse platform — unified storage in Delta format, Spark-based processing, Unity Catalog for governance, Photon engine for SQL, MLflow for ML lifecycle, Mosaic AI for generative AI. First-party Azure offering with Microsoft commercial relationship.

In: Azure consumption

Data & Analytics

Why for this story

For data engineering on ticket history, support session logs, and any structured data feeding RAG. Optional unless data preparation is heavy.

Back to Pro-code GenAI with Azure AI Foundry

We need a grounded LLM that answers from our product docs — not the open internet

Signals this story fits

Questions to ask

Typical concerns

Capability gaps

Key capabilities

Enabling SKUs

Decision 1.Retrieval — Azure AI Search vs custom vector store

Decision 2.Model substrate — Foundry-hosted vs Mosaic AI on Databricks

Decision 3.Content filtering strictness — default vs custom

45-day grounded-LLM prototype for support agents

Success criteria

Proof metrics

Recommended cards

Azure AI Foundry

Microsoft Entra ID P1

Azure Databricks