Solution Atlas
SpecialisedUser storyConsultative playbook

We need a grounded LLM that answers from our product docs — not the open internet

A SaaS company's support team wants a grounded assistant that answers from product documentation, runbooks, and ticket history. The CTO insists on responsible-AI guardrails — content filtering, evaluation harnesses, and lineage — before anything goes live to customers.

Trigger
Support cost rising; product team has a clear use case.
Good outcome
Foundry deployment with AI Search RAG, evaluation pipeline, content filtering, responsible-AI gates.
Diagnostic discovery

Signals this story fits

Observable cues that confirm the conversation belongs here.

  • ·Customer-support volume rising; product complexity growing
  • ·Product team has clear documentation surface (Confluence, SharePoint, internal wiki)
  • ·CISO insists on responsible-AI guardrails before any customer-facing launch
  • ·Existing Azure footprint and Azure OpenAI access already in place
  • ·Internal LLM experiments running without lineage or governance

Questions to ask

Open-ended, SPIN-style — each one has a reason it matters.

  1. 1.What are your support team's top question categories, and which are documented?

    WhyDocumented topics are the RAG sweet-spot. Undocumented ones are different (training data) projects.

  2. 2.Where do product docs live — SharePoint, Confluence, Notion, custom?

    WhyDetermines source-content indexing strategy and AI Search configuration.

  3. 3.What's your RAG pipeline today, if any — notebook prototypes, ad-hoc tooling, nothing?

    WhySurfaces maturity. Most customers are at "notebook prototype" stage.

  4. 4.How do you evaluate model output quality?

    WhyWithout evaluation, scale is impossible. Surfaces the responsible-AI gap.

  5. 5.What content filtering is configured on Azure OpenAI today?

    WhyDefault filtering is rarely enough for customer-facing scenarios. Tests CISO readiness.

  6. 6.Has Legal reviewed the proposed launch surface (internal beta vs customer-facing)?

    WhyCustomer-facing AI carries materially different obligations.

Baseline → target architecture

TOGAF-style gap framing — what we typically see today, and what the proposed end state looks like. The gap between them is the engagement.

Baseline architecture

Support team handling rising volume manually. Product documentation fragmented. Ad-hoc Azure OpenAI experiments running on default settings. No evaluation harness. No content filtering beyond defaults. No lineage of what content the model sees.

Typical concerns

  • ·Support cost rising faster than headcount budget
  • ·Documentation surface fragmented
  • ·AI experiments without responsible-AI guardrails
  • ·No evaluation framework for output quality
  • ·Customer-facing risk if launched without governance

Capability gaps

  • ·RAG pipeline with grounded retrieval
  • ·Evaluation harness with quality gates
  • ·Content filtering tuned to scenario
  • ·Identity-bound endpoint access
  • ·Lineage of grounding content
Target architecture

Azure AI Foundry hub with AI Search indexing product documentation. RAG pipeline grounded on indexed content. Evaluation harness with quality gates (relevance, faithfulness, hallucination rate). Content filtering tuned for customer-facing tier. Entra ID-bound access for internal users and (later) customer-facing endpoints. Optional Databricks for data engineering on ticket history.

Key capabilities

  • AI Search-backed RAG pipeline
  • Evaluation harness and quality gates
  • Content filtering tuned to scenario
  • Identity-bound endpoint access
  • Responsible AI build-time discipline

Enabling SKUs

Resolved in the ‘Recommended cards’ section below.

Architecture decisions

Each decision is offered as explicit options with trade-offs — Hohpe's “selling options” principle. A safe default is noted where one exists.

  1. Decision 1.Retrieval — Azure AI Search vs custom vector store

    Azure AI Search

    When it fitsAzure-native estate; native Foundry integration; mixed dense + keyword retrieval needed.

    Trade-offsAI Search consumption cost at scale.

    Custom vector store (e.g. pgvector, Pinecone)

    When it fitsExisting vector-store investment; very specific retrieval patterns.

    Trade-offsMore integration work; less native Foundry tooling.

    Default recommendationAzure AI Search for the first RAG workload; revisit only if scale demands it.

  2. Decision 2.Model substrate — Foundry-hosted vs Mosaic AI on Databricks

    Foundry-hosted

    When it fitsAzure-native; need broad model catalogue; pro-code AI engineering.

    Trade-offsTwo governance planes if Databricks also in use.

    Mosaic AI on Databricks

    When it fitsData and ML workloads already on Databricks; lakehouse-native lineage needed.

    Trade-offsSmaller GenAI ecosystem than Foundry.

    Default recommendationFoundry where the data lives in Azure; Mosaic AI where the data is already in Databricks.

  3. Decision 3.Content filtering strictness — default vs custom

    Default

    When it fitsInternal-only beta; risk surface low.

    Trade-offsGeneric filtering may block legitimate domain queries.

    Custom (scenario-tuned)

    When it fitsCustomer-facing launch; specific compliance requirements.

    Trade-offsMore tuning effort; needs iterative refinement.

    Default recommendationDefault for the internal beta; transition to custom before customer-facing launch.

Low-risk trial — proof of value

45-day grounded-LLM prototype for support agents

6 weeks

Foundry hub provisioned. Product documentation indexed via AI Search. RAG endpoint live with content filtering. Evaluation harness covering relevance, faithfulness, hallucination rate. Internal beta to 20 support agents. Telemetry captured for grounding hit-rate, response time, escalation rate.

Success criteria

  • Response relevance score above 80% on evaluation harness
  • Hallucination rate below 5% on evaluation harness
  • Support-agent NPS on the tool above baseline
  • Zero content-filter false positives causing escalation gaps

InvestmentFoundry token consumption + AI Search capacity. Estimated ~€2–4k/month for the trial scope. No customer-facing launch decisions made during trial.

Proof metrics

  • ·Relevance score, faithfulness, hallucination rate tracked daily
  • ·Internal-agent NPS on the tool above baseline
  • ·Reduction in time-to-answer for indexed topics
  • ·Evaluation harness covering the top ten question categories

Recommended cards

The SKUs and capabilities most likely to be part of the solution, with the editorial rationale for each in the context of this story. Add the ones that fit your situation.

Back to Pro-code GenAI with Azure AI Foundry