Solution Atlas
EverydayUser storyConsultative playbook

Our AI projects are blowing through budgets and we do not know which model is the cost

Multiple AI workloads on Azure OpenAI and Foundry are exceeding their budget allocations. The CFO can see the total but cannot attribute cost by model, prompt complexity, or team. Without granular visibility, optimisation is impossible.

Trigger
AI budget overrun; CFO demand for attribution.
Good outcome
Per-workload AI cost dashboard, token-level attribution, model-fit analysis per workload, monthly optimisation cadence.
Diagnostic discovery

Signals this story fits

Observable cues that confirm the conversation belongs here.

  • ·AI workload spend exceeded budget without warning
  • ·CFO can see the total OpenAI bill but cannot attribute by workload, model, or team
  • ·Token consumption growing faster than feature delivery
  • ·No model-fit analysis — every workload using the largest model "to be safe"
  • ·No monthly optimisation cadence on AI spend

Questions to ask

Open-ended, SPIN-style — each one has a reason it matters.

  1. 1.How many AI workloads are running on Azure OpenAI or Foundry, and what is the budget for each?

    WhySizes the attribution scope.

    Listen for: “we know the total but not the breakdown” · “every product team has something” · “budgets were set before we knew the run rate”

  2. 2.Walk me through the last AI budget overrun — what caused it, and when did you notice?

    WhySurfaces the visibility gap.

  3. 3.Are workloads tagged today — product, model, environment?

    WhyDrives the attribution capability. Without tags, Cost Management cannot break down AI spend.

  4. 4.Has anyone done model-fit analysis — is each workload on the right model for its task?

    WhySurfaces the optimisation lever. Many workloads run on GPT-4 when GPT-3.5 would suffice; some run on the wrong embedding model entirely.

  5. 5.What does the prompt and response pattern look like — short interactive, long batch, agentic?

    WhyDrives the optimisation strategy. Agentic workloads with multi-turn reasoning are the highest-cost surprise.

  6. 6.Who owns AI cost — central FinOps, AI engineering, or the product teams?

    WhyCritical ownership question. Without a named owner, the run rate climbs.

Baseline → target architecture

TOGAF-style gap framing — what we typically see today, and what the proposed end state looks like. The gap between them is the engagement.

Baseline architecture

AI workloads on Azure OpenAI and Foundry deployed across product teams. Total spend visible at tenant level but not attributed by workload, model, or team. Tags inconsistent. No token-level metering by workload. Every workload chose the largest model "to be safe". No optimisation cadence. CFO sees the bill, cannot challenge specific decisions.

Typical concerns

  • ·AI spend growing faster than feature delivery
  • ·Cannot attribute cost to specific workloads or teams
  • ·Model-fit analysis absent — overspecified models everywhere
  • ·No anomaly detection on AI consumption
  • ·No optimisation cadence

Capability gaps

  • ·Per-workload AI cost attribution
  • ·Token-level metering via Azure Monitor
  • ·Model-fit analysis per workload
  • ·AI cost dashboards consumable by product owners
  • ·Monthly AI optimisation cadence
Target architecture

Tag taxonomy enforced on AI resources — workload, model, product, environment. Azure Monitor diagnostic settings capture token-level telemetry per workload. Microsoft Cost Management produces per-workload AI cost views. Foundry hub surfaces model-fit analysis showing token consumption vs business outcome per workload. Power BI dashboard combines cost + tokens + outcome. Monthly AI optimisation cadence with product owners and AI engineering reviews model fit and prompt engineering levers.

Key capabilities

  • Tag taxonomy for AI resources
  • Token-level telemetry via Azure Monitor
  • Per-workload AI cost attribution (Cost Management)
  • Model-fit analysis (Foundry)
  • Monthly AI optimisation cadence

Enabling SKUs

Resolved in the ‘Recommended cards’ section below.

Architecture decisions

Each decision is offered as explicit options with trade-offs — Hohpe's “selling options” principle. A safe default is noted where one exists.

  1. Decision 1.Telemetry capture — Azure Monitor diagnostic settings vs custom logging vs Foundry-native metrics

    Azure Monitor diagnostic settings

    When it fitsStandard Azure OpenAI deployments; native diagnostic settings cover token telemetry.

    Trade-offsLog Analytics consumption cost.

    Custom logging via application instrumentation

    When it fitsWorkloads with bespoke metrics (latency per prompt type, business outcome per call).

    Trade-offsEngineering effort; risk of drift.

    Foundry-native metrics

    When it fitsWorkloads built on Foundry hub; native evaluation and metering.

    Trade-offsFoundry-only; not a fit for raw Azure OpenAI deployments.

    Default recommendationAzure Monitor diagnostic settings as the baseline; Foundry-native metrics for workloads already on Foundry; custom logging only where business-outcome telemetry is missing.

  2. Decision 2.Optimisation lever priority — model right-sizing vs prompt engineering vs caching vs PTU commitment

    Model right-sizing

    When it fitsMost workloads default to the largest model; right-sizing usually 30–60% saving.

    Trade-offsQuality regression risk; needs evaluation harness.

    Prompt engineering

    When it fitsVerbose prompts or repeated context; concise prompts and system message optimisation help.

    Trade-offsEngineering effort; small per-workload gains add up.

    Caching (semantic or exact)

    When it fitsHigh prompt repetition (FAQ, support, product info); cache hit rate >30% feasible.

    Trade-offsCache invalidation complexity.

    PTU commitment (Provisioned Throughput Units)

    When it fitsPredictable steady-state workload; commitment trades flexibility for cost.

    Trade-offsUnderutilisation risk if traffic drops.

    Default recommendationModel right-sizing first (largest lever); prompt engineering and caching in phase two; PTU only after steady-state demand is proven.

  3. Decision 3.Ownership model — central FinOps vs AI engineering vs product-team-led

    Central FinOps-led

    When it fitsFinOps practice mature; AI is one workload among many.

    Trade-offsMay lack the AI-specific depth.

    AI engineering-led

    When it fitsAI engineering function mature; embeds FinOps practice inside the AI org.

    Trade-offsMay not connect to broader FinOps governance.

    Product-team-led with central FinOps support

    When it fitsProduct teams own AI workloads; FinOps provides the dashboards and cadence.

    Trade-offsCoordination overhead.

    Default recommendationProduct-team-led with central FinOps providing dashboards and cadence. AI engineering supports the model-fit analysis.

Low-risk trial — proof of value

60-day AI cost attribution + model right-sizing for top 5 workloads

8 weeks

Tag taxonomy applied to AI resources for the trial workloads. Azure Monitor diagnostic settings live for token telemetry. Cost Management per-workload views configured. Power BI dashboard combining cost + tokens per workload. Foundry model-fit analysis run for each of the top 5 workloads. At least 2 workloads right-sized based on the analysis (typically GPT-4 → GPT-4o-mini or GPT-3.5). Monthly AI optimisation cadence designed with product owners and AI engineering.

Success criteria

  • Per-workload cost attribution live for top 5 workloads
  • Token telemetry visible per workload with no instrumentation gap
  • Two workloads right-sized with quality regression measured
  • 20%+ cost reduction for the right-sized workloads

InvestmentCost Management is included with Azure. Azure Monitor Log Analytics consumption + Foundry hub. Estimated ~€3–6k/month for the trial scope. Existing AI workloads continue running during the analysis.

Proof metrics

  • ·Per-workload AI cost attribution live for top 5 workloads
  • ·Cost reduction of 20%+ for right-sized workloads
  • ·Quality regression < 5% on evaluation harness for right-sized workloads
  • ·Monthly AI optimisation cadence operational with named participants

Recommended cards

The SKUs and capabilities most likely to be part of the solution, with the editorial rationale for each in the context of this story. Add the ones that fit your situation.

Why for this story

The source of truth for Azure spend including AI. Per-workload cost views and budget alerts come from Cost Management — the substrate the AI cost dashboard consumes.

Why for this story

Diagnostic settings on Azure OpenAI capture token-level telemetry. Without Azure Monitor, attribution stops at the resource level and the per-prompt cost story cannot be told.

Back to Workload-specific FinOps