SpecialisedUser storyConsultative playbook

Our ML team has outgrown notebooks and needs a proper lakehouse

A retail bank's ML team is running models in fragmented Databricks workspaces with no central governance. They need Unity Catalog, MLflow lifecycle management, and a path to production GenAI on the same data substrate.

Trigger: ML production incidents; regulator wants lineage and access auditing.
Good outcome: Unity Catalog tenant-wide, DBCU commitment sized to forecast, MLflow + Mosaic AI in production.

Diagnostic discovery

Signals this story fits

Observable cues that confirm the conversation belongs here.

·ML team running on multiple fragmented Databricks workspaces
·Regulator flagged model lineage and governance gaps
·MLflow not in production; models managed informally
·Mosaic AI or GenAI on the strategic roadmap
·DBU costs surprising; no DBCU commitment

Questions to ask

Open-ended, SPIN-style — each one has a reason it matters.

1.How many Databricks workspaces are live today and who owns each?
WhySurfaces sprawl. Multi-workspace estates without Unity Catalog are the canonical baseline for this story.
2.Is your governance plane workspace-scoped or Unity Catalog?
WhyUnity Catalog is the prerequisite for tenant-wide governance, lineage, and fine-grained access.
3.Where does your ML lifecycle live today — notebooks, MLflow, ad-hoc services?
WhyDetermines maturity of the registry + retraining cadence.
4.Have you priced Databricks Mosaic AI against Azure AI Foundry for the GenAI roadmap?
WhyThe Foundry vs Mosaic AI decision is workload-fit, not vendor preference. Surfaces whether the customer has done the analysis.
5.What DBCU commitment have you made, if any?
WhyDBCU produces 20–40% discount on stable workloads. Without it, the customer is paying retail.
6.What is your storage substrate — ADLS Gen2 only, or mixed?
WhyDelta Lake lives on ADLS; mixed substrate complicates Unity Catalog rollout.

Baseline → target architecture

TOGAF-style gap framing — what we typically see today, and what the proposed end state looks like. The gap between them is the engagement.

Baseline architecture

Multiple Databricks workspaces created per team. Workspace-scoped Hive metastore. MLflow used informally. No central model registry. DBU pay-as-you-go. ADLS Gen2 the storage substrate. Lineage informal.

Typical concerns

·Fragmented governance across workspaces
·Models in production without lineage or owner
·DBU cost surprises from spot misuse and idle clusters
·No drift detection
·No defensible answer to "is this model still fit for purpose?"

Capability gaps

·Unity Catalog as tenant-wide governance
·MLflow as central model registry
·Drift detection and retraining cadence
·DBCU commitment discipline
·Foundry vs Mosaic AI workload-fit decision

Target architecture

Unity Catalog rolled out tenant-wide as the governance plane. ADLS Gen2 as the storage substrate beneath Delta Lake. DBCU committed to forecast workloads. MLflow as the central model registry with drift detection automated. Mosaic AI for lakehouse-native GenAI. Foundry for non-Spark workloads and the broader Microsoft AI surface. Purview Data Governance federates Unity Catalog with Fabric and Snowflake.

Key capabilities

Unity Catalog tenant-wide
Central model registry + lineage
Drift detection and retraining cadence
DBCU commitment discipline
Mosaic AI / Foundry workload-fit

Enabling SKUs

Resolved in the ‘Recommended cards’ section below.

Architecture decisions

Each decision is offered as explicit options with trade-offs — Hohpe's “selling options” principle. A safe default is noted where one exists.

Decision 1.GenAI platform — Mosaic AI on Databricks vs Azure AI Foundry
Mosaic AI
When it fitsData and ML workloads already on Databricks; need governance + lineage on the same plane.
Trade-offsSmaller GenAI ecosystem than Azure OpenAI.
Azure AI Foundry
When it fitsAzure-native estate; non-Spark workloads; broader model catalogue needed.
Trade-offsTwo governance planes if Databricks also in use.
Default recommendationMosaic AI where the data is already in Databricks; Foundry for the broader Microsoft AI surface.
Decision 2.Unity Catalog primary vs Purview Data Governance primary
Unity Catalog primary
When it fitsDatabricks-dominant estate; lineage primarily within Databricks.
Trade-offsCross-platform federation requires Purview anyway.
Purview Data Governance primary
When it fitsMulti-platform estate (Databricks + Fabric + Snowflake).
Trade-offsLess native depth than Unity Catalog within Databricks itself.
Default recommendationUnity Catalog native, Purview federates across platforms.
Decision 3.DBCU commitment level — 1-year vs 3-year vs none
3-year DBCU
When it fitsStable workload pattern; large estate; willing to commit.
Trade-offsLarger commitment with less flexibility.
1-year DBCU
When it fitsModerate predictability; growth phase.
Trade-offsLower discount tier.
None (pay-as-you-go)
When it fitsWorkload volatile; estate small.
Trade-offsNo discount; surprise bills.
Default recommendation1-year DBCU sized to 70% of forecast steady-state; pay-as-you-go for the variable tier.

Low-risk trial — proof of value

60-day Unity Catalog rollout + MLflow + Mosaic AI POC

8 weeks

Unity Catalog enabled tenant-wide with the first workspace migrated. MLflow central registry stood up. One production model brought under lineage + drift detection. One Mosaic AI POC against a GenAI use case grounded on Unity-Catalog data.

Success criteria

Unity Catalog live with one workspace fully migrated
Central MLflow registry with one production model registered
Drift detection alerts produced for the trial model
Mosaic AI POC produces a working RAG endpoint against catalogued data

InvestmentDBCU consumption only; commitment decisions deferred to month 3. Mosaic AI on existing DBU rates.

Proof metrics

·Unity Catalog adoption % at trial end
·Time-to-model-deployment for registered models
·Drift alert quality (signal vs noise)
·GenAI POC response quality and latency

Recommended cards

The SKUs and capabilities most likely to be part of the solution, with the editorial rationale for each in the context of this story. Add the ones that fit your situation.

SKUDatabricksMulti-cloud

Azure Databricks

The lakehouse platform — unified storage in Delta format, Spark-based processing, Unity Catalog for governance, Photon engine for SQL, MLflow for ML lifecycle, Mosaic AI for generative AI. First-party Azure offering with Microsoft commercial relationship.

In: Azure consumption

Data & Analytics

Why for this story

The lakehouse platform — Delta storage, Spark compute, Unity Catalog governance, MLflow lifecycle. Multi-cloud portable and the canonical home if ML is strategic.

SKUMicrosoftAzure

Azure Data Lake Storage Gen2

The storage substrate for Azure data and analytics — hierarchical namespace, POSIX-style permissions, multi-tier lifecycle management. The default backing for Microsoft Fabric OneLake, Azure Databricks, Azure Synapse, and direct-query data lake patterns.

In: Azure consumption

Data & Analytics

Why for this story

The storage substrate. Delta lives on ADLS; tiering, lifecycle, and lineage travel through every platform decision above.

SKUMicrosoftSaaS

Microsoft Purview (Data Governance)

Microsoft's enterprise data catalog, data lineage, and data estate governance platform — distinct from the M365 Purview Information Protection SKU. Catalogues data assets across Azure, on-premises, AWS, GCP, and SaaS sources; tracks lineage; classifies sensitive data automatically.

In: Azure consumption

Data & Analytics

Why for this story

Cross-platform federation. Unity Catalog is Databricks-native; Purview federates Unity Catalog with Fabric and Snowflake.

SKUMicrosoftAzure

Azure AI Foundry

Microsoft's unified AI development platform — Azure OpenAI Service, model catalog (100+ open and proprietary models), agent service, prompt flow, evaluation, AI Search, Azure ML capabilities. The Microsoft strategic surface for AI engineering and generative AI applications.

In: Azure consumption

Data & Analytics

Why for this story

The Microsoft AI surface that complements Mosaic AI. Foundry for the broader model catalogue and non-Spark workloads; Mosaic AI for lakehouse-native GenAI.

Back to Lakehouse with Databricks

Our ML team has outgrown notebooks and needs a proper lakehouse

Signals this story fits

Questions to ask

Typical concerns

Capability gaps

Key capabilities

Enabling SKUs

Decision 1.GenAI platform — Mosaic AI on Databricks vs Azure AI Foundry

Decision 2.Unity Catalog primary vs Purview Data Governance primary

Decision 3.DBCU commitment level — 1-year vs 3-year vs none

60-day Unity Catalog rollout + MLflow + Mosaic AI POC

Success criteria

Proof metrics

Recommended cards

Azure Databricks

Azure Data Lake Storage Gen2

Microsoft Purview (Data Governance)

Azure AI Foundry