Models scattered across Databricks workspaces, notebooks, and custom services. MLflow used informally without a central registry. Drift detection ad-hoc. Retraining manual and reactive. No documented lineage from training data to deployed model.
Typical concerns
- ·No defensible answer to "is this model still fit for purpose?"
- ·Model performance degrading silently
- ·Retraining triggered only when something breaks
- ·No model cards or attestation for regulator
- ·GenAI workloads adding to the sprawl
Capability gaps
- ·Central model registry
- ·Automated drift detection
- ·Retraining cadence with governance
- ·Model cards and lineage
- ·Responsible AI gates wired into the lifecycle