SpecialisedUser storyConsultative playbook

AKS is becoming the standard runtime and we need governance before it sprawls

Engineering teams are converging on Kubernetes as the application runtime. The platform team wants a paved-road AKS pattern with central security, networking, and policy before per-team clusters multiply beyond what platform engineering can support.

Trigger: Multiple AKS clusters spinning up without standards; security concerned.
Good outcome: Central AKS platform with policy guardrails, namespace governance, and Defender for Containers baseline.

Diagnostic discovery

Signals this story fits

Observable cues that confirm the conversation belongs here.

·Multiple AKS clusters spinning up per team without central pattern
·No central image registry policy or supply-chain scanning
·Defender for Containers not deployed
·Cluster networking varies — some public, some private, some mixed
·Cost spikes from idle clusters; no attribution

Questions to ask

Open-ended, SPIN-style — each one has a reason it matters.

1.How many AKS clusters are live today, and who provisioned each?
WhySurfaces the sprawl. Most customers cannot give a confident answer.
Listen for: “varies” · “each team has its own” · “we are not sure”
2.What is your cluster-creation pattern — Terraform, Bicep, console, scripts?
WhyDetermines whether a paved-road template would be adopted easily.
3.How does cluster networking integrate with your landing zone or hub?
WhyNetworking is the most-divergent dimension across teams. Surfaces the standardisation opportunity.
4.What is your image registry and supply-chain posture?
WhyACR with vulnerability scanning is the canonical answer; surfaces whether it is in place.
5.Who owns runtime SLOs for production AKS clusters today?
WhyOften nobody; the platform-as-product question lives here.
6.What is your Defender for Containers coverage?
WhyContainer security posture; usually absent or partial.

Baseline → target architecture

TOGAF-style gap framing — what we typically see today, and what the proposed end state looks like. The gap between them is the engagement.

Baseline architecture

Per-team AKS clusters provisioned with inconsistent configuration. No central image registry policy. Cluster networking varies (public clusters, private clusters, mixed). Defender for Containers not in scope. Cost attribution per cluster manual. No paved-road template.

Typical concerns

·Inconsistent security posture across clusters
·Image supply-chain unaudited
·Network configuration drift
·Production clusters without SLO owners
·Cost growth without attribution

Capability gaps

·Paved-road AKS template
·Azure Policy at management-group level
·ACR with vulnerability scanning
·Defender for Containers tenant-wide
·Cluster-cost attribution

Target architecture

Paved-road AKS pattern delivered via a central Bicep template — opinionated network integration with the landing-zone hub, ACR as the standard registry with vulnerability scanning, Azure Policy at the management-group level, Defender for Containers tenant-wide. Cost attribution per cluster via tag taxonomy. Platform team owns the runtime SLO baseline; project teams operate within it.

Key capabilities

Paved-road AKS template
Central image registry with vulnerability scanning
Tenant-wide container security posture
Policy-enforced cluster configuration
Cluster-cost attribution

Enabling SKUs

Resolved in the ‘Recommended cards’ section below.

Architecture decisions

Each decision is offered as explicit options with trade-offs — Hohpe's “selling options” principle. A safe default is noted where one exists.

Decision 1.Cluster pattern — per-team clusters vs central multi-tenant clusters
Per-team clusters (default)
When it fitsTeam autonomy prioritised; blast-radius isolation per team.
Trade-offsCluster count grows with team count; cost overhead per cluster.
Central multi-tenant clusters
When it fitsMature platform team; smaller engineering org; tight cost discipline.
Trade-offsTenant isolation depends on namespace + policy; weaker boundary than per-cluster.
Default recommendationPer-team clusters with strong paved-road template. Move to multi-tenant only when platform team has explicit capacity.
Decision 2.Networking — public AKS endpoint vs private cluster
Public endpoint with authorized IP ranges
When it fitsLower complexity; engineering convenience.
Trade-offsAttack surface; auditor may demand private.
Private cluster (no public endpoint)
When it fitsRegulated or auditor-pressured; consistent with landing-zone Zero Trust.
Trade-offsEngineering complexity; jump host or VPN required for kubectl.
Default recommendationPrivate cluster as the default; public endpoint only for explicit, justified exceptions.
Decision 3.Image registry — ACR vs Harbor vs external (ECR, GHCR)
ACR (Azure Container Registry)
When it fitsAzure-native; native Defender integration; consistent with the estate.
Trade-offsMulti-cloud workloads carry separate ACR or replication overhead.
Harbor (self-hosted)
When it fitsMulti-cloud workloads; existing Harbor investment.
Trade-offsSelf-hosting overhead; less native integration.
GHCR (GitHub Container Registry)
When it fitsGitHub-native pipelines; small or developer-led estate.
Trade-offsLess mature than ACR for production-scale scanning.
Default recommendationACR for Azure-resident workloads; consider Harbor or GHCR only for explicit multi-cloud or developer-experience reasons.

Low-risk trial — proof of value

6-week AKS paved-road pattern + onboard 3 existing clusters

6 weeks

Paved-road Bicep template authored with policy guardrails. ACR provisioned with vulnerability scanning. Defender for Containers enabled tenant-wide. Three existing clusters refactored or rebuilt against the paved-road pattern. Cost attribution via tag taxonomy validated end-to-end.

Success criteria

Paved-road template published with documentation
Three clusters onboarded with policy compliance above 90%
Defender for Containers producing actionable signal
Image vulnerability scanning live with triage cadence

InvestmentDefender for Containers per-vCore. ACR consumption. Estimated ~€2–3k/month for the trial scope. Existing clusters untouched until refactored against the template.

Proof metrics

·Cluster policy-compliance score above 90% on paved-road clusters
·Image vulnerability scanning hit rate (legitimate, not noise)
·Cost attribution per cluster operational
·Platform-team time saved on new cluster provisioning

Recommended cards

The SKUs and capabilities most likely to be part of the solution, with the editorial rationale for each in the context of this story. Add the ones that fit your situation.

SKUMicrosoftSaaS

Microsoft Defender for Cloud

Cloud-native security posture management (CSPM) and workload protection (CWPP) across Azure, AWS, and GCP. The platform-team's continuous-assessment substrate — secure score, regulatory compliance dashboards, and per-workload threat protection in one product.

In: Azure consumption

Cloud Foundation

Why for this story

Defender for Containers extends CSPM + workload protection to AKS. Vulnerability scanning, runtime threat detection, and policy compliance all land here.

SKUMicrosoftAzure

Azure Monitor

Azure's first-party observability platform — metrics, logs, traces, and alerts across Azure resources, hybrid workloads (via Azure Arc), and other clouds. The substrate platform teams use to know what's actually happening across the estate.

In: Azure consumption

Cloud Foundation

Why for this story

Container Insights provides the operational telemetry — pod-level metrics, logs, distributed tracing. The runtime SLO baseline lives here.

SKUMicrosoftSaaS

Microsoft Entra ID P1

The identity foundation for Microsoft 365 and integrated SaaS. Adds Conditional Access, group-based access management, hybrid identity, and MFA on top of the free Entra ID tier. The MFA + Conditional Access pair is the practical floor for Zero Trust.

In: M365 E3, M365 E5…

Cloud Foundation

Why for this story

Workload identity for pods accessing Azure resources. The identity guardrail under the cluster, replacing static credentials.

Back to Container & application platform

AKS is becoming the standard runtime and we need governance before it sprawls

Signals this story fits

Questions to ask

Typical concerns

Capability gaps

Key capabilities

Enabling SKUs

Decision 1.Cluster pattern — per-team clusters vs central multi-tenant clusters

Decision 2.Networking — public AKS endpoint vs private cluster

Decision 3.Image registry — ACR vs Harbor vs external (ECR, GHCR)

6-week AKS paved-road pattern + onboard 3 existing clusters

Success criteria

Proof metrics

Recommended cards

Microsoft Defender for Cloud

Azure Monitor

Microsoft Entra ID P1