Why Platform Engineering Exists
Every engineering organisation eventually hits the same ceiling. The infrastructure grows complex, oncall rotations get overloaded, and developer teams spend an increasing fraction of their time on undifferentiated heavy lifting — provisioning databases, configuring CI pipelines, debugging Kubernetes YAML, managing secrets. The "you build it, you run it" mandate delivered ownership but also fragmentation: a hundred teams reinventing the same patterns, each slightly differently, each accumulating its own toil.
Platform engineering is the discipline of building the internal product that eliminates that toil. The platform team operates like a product team — with customers who are developers, a roadmap driven by developer pain, and success measured by adoption and time-to-production, not by ticket volume. The output is an Internal Developer Platform (IDP): a curated set of self-service capabilities that let teams provision infrastructure, deploy services, and operate systems without needing to become infrastructure experts.
Cognitive Load Reduction
The primary value of a good IDP is not automation — it is reducing the cognitive load on development teams. Developers should understand their service's SLOs, not the intricacies of Kubernetes PodDisruptionBudgets or Terraform provider version pinning.
Paved Roads, Not Guardrails
The platform provides golden paths — opinionated, well-lit routes to production — not rigid walls. Teams can deviate from golden paths when they need to, but deviating should feel like leaving a paved road: possible, but noticeably harder.
Platform as Internal Product
Platform teams that treat their work as infrastructure projects fail. Successful platform teams have a product mindset: they talk to users (developers), measure adoption, prioritise based on impact, and ship incrementally — treating developer experience as seriously as user experience.
IDP Architecture: Layers and Capabilities
An IDP is not a single tool. It is a coherent set of capabilities spread across multiple layers. The Humanitec Platform Orchestrator model describes five layers: developer portal, deployment automation, dynamic configuration management, infrastructure orchestration, and monitoring. In practice, most organisations build these layers by composing open-source and managed tools rather than building from scratch.
# A typical IDP technology stack
#
# Layer 1 — Developer Portal
# Backstage (open-source, Spotify) https://backstage.io
# Port https://getport.io
# Cortex https://cortex.io
#
# Layer 2 — Service Templates / Scaffolding
# Backstage Software Templates cookiecutter-style with nunjucks
# Cookiecutter https://cookiecutter.readthedocs.io
# Copier https://copier.readthedocs.io
#
# Layer 3 — Infrastructure Self-Service
# Terraform + Atlantis GitOps-driven apply
# Crossplane Kubernetes-native IaC
# AWS Service Catalog / GCP Config managed service catalogs
#
# Layer 4 — CI/CD as Platform Service
# GitHub Actions reusable workflows .github/workflows/*.yml
# Tekton (Kubernetes-native CI) https://tekton.dev
# ArgoCD (GitOps CD) https://argoproj.github.io/cd
#
# Layer 5 — Observability Platform
# Grafana + Prometheus + Loki open-source LGTM stack
# Datadog / New Relic managed alternatives
# OpenTelemetry Collector telemetry pipeline
#
# Glue layer — Platform API / Orchestrator
# Humanitec Platform Orchestrator managed
# Kratix open-source promise-based
# Score workload spec abstractionNote
Building the Developer Portal with Backstage
Backstage is the open-source framework most organisations choose as their developer portal. It provides a Software Catalog (service registry), TechDocs (documentation-as-code), Software Templates (scaffolding), and a plugin ecosystem covering Kubernetes, CI/CD, cost, and more. The catalog is the core: every service, API, data pipeline, and library lives there, with ownership, SLO status, and links to runbooks.
# Bootstrap a new Backstage app
npx @backstage/create-app@latest --path my-backstage
# Project structure
my-backstage/
├── packages/
│ ├── app/ # Frontend React application
│ │ └── src/
│ │ ├── App.tsx # Plugin registration
│ │ └── components/ # Custom UI overrides
│ └── backend/ # Node.js backend
│ └── src/
│ ├── index.ts # Backend plugin registration
│ └── plugins/ # Custom backend plugins
├── app-config.yaml # Main configuration
├── app-config.production.yaml # Production overrides
└── catalog-info.yaml # Root catalog entry# app-config.yaml — Backstage configuration
app:
title: ACME Developer Portal
baseUrl: https://backstage.acme.internal
backend:
baseUrl: https://backstage.acme.internal
database:
client: pg
connection:
host: ${POSTGRES_HOST}
port: ${POSTGRES_PORT}
user: ${POSTGRES_USER}
password: ${POSTGRES_PASSWORD}
database: backstage
# --- Software Catalog ---
catalog:
# Rules: only allow specific entity kinds from specific locations
rules:
- allow: [Component, API, Resource, Location, Template, Group, User, System, Domain]
# Locations: where to discover catalog entities
locations:
# Org structure (teams, users)
- type: url
target: https://github.com/acme/backstage-catalog/blob/main/org.yaml
rules:
- allow: [Group, User]
# All service catalog-info.yaml files across the org
- type: github-discovery
target: https://github.com/acme/*/blob/main/catalog-info.yaml
# Software Templates for scaffolding
- type: url
target: https://github.com/acme/backstage-catalog/blob/main/templates/all-templates.yaml
rules:
- allow: [Template]
# --- GitHub integration ---
integrations:
github:
- host: github.com
token: ${GITHUB_TOKEN}
# --- Auth (GitHub OAuth) ---
auth:
environment: production
providers:
github:
production:
clientId: ${GITHUB_CLIENT_ID}
clientSecret: ${GITHUB_CLIENT_SECRET}
# --- Kubernetes plugin ---
kubernetes:
serviceLocatorMethod:
type: multiTenant
clusterLocatorMethods:
- type: config
clusters:
- url: https://k8s-prod.acme.internal
name: production
authProvider: serviceAccount
serviceAccountToken: ${K8S_SA_TOKEN}
skipTLSVerify: false
caData: ${K8S_CA_DATA}
# --- TechDocs ---
techdocs:
builder: external # docs built in CI, not by Backstage
generator:
runIn: docker
publisher:
type: googleGcs
googleGcs:
bucketName: acme-techdocs
credentials: ${GOOGLE_APPLICATION_CREDENTIALS}# catalog-info.yaml — a microservice entry in the Software Catalog
# Place this file in the root of every service repository
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: order-service
title: Order Service
description: Manages order creation, payment processing, and fulfillment tracking
annotations:
# GitHub Actions CI link
github.com/project-slug: acme/order-service
# ArgoCD app link
argocd/app-name: order-service-production
# Kubernetes namespace
backstage.io/kubernetes-namespace: orders
# PagerDuty service ID
pagerduty.com/service-id: P1234AB
# TechDocs site
backstage.io/techdocs-ref: dir:.
tags:
- orders
- payments
- golang
links:
- url: https://grafana.acme.internal/d/order-service
title: Grafana Dashboard
icon: dashboard
- url: https://acme.pagerduty.com/service-directory/P1234AB
title: PagerDuty
icon: alert
spec:
type: service
lifecycle: production
owner: group:order-team
system: commerce-platform
# APIs this service provides
providesApis:
- order-api
# APIs this service consumes
consumesApis:
- payment-api
- catalog-api
# Infrastructure resources it depends on
dependsOn:
- resource:orders-postgres
- resource:orders-redisGolden Paths: Software Templates and Scaffolding
Golden paths are opinionated, pre-approved routes from idea to production. They encode your organisation's best practices — language choice, testing framework, CI configuration, observability instrumentation, security scanning — into a repeatable template that developers can invoke in minutes. Backstage's Software Templates are the primary mechanism for delivering golden paths: a YAML template definition drives a wizard UI, collects developer input, and triggers a series of actions — creating a repository, rendering files, opening a PR, registering the service in the catalog.
# templates/go-microservice/template.yaml
# Backstage Software Template — creates a new Go microservice
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: go-microservice
title: Go Microservice
description: Production-ready Go service with CI/CD, observability, and catalog registration
tags:
- golang
- microservice
- recommended
spec:
owner: group:platform-team
type: service
# --- Step 1: Collect input from the developer ---
parameters:
- title: Service Details
required: [name, description, owner]
properties:
name:
title: Service Name
type: string
description: "Lowercase, hyphen-separated (e.g. order-service)"
pattern: "^[a-z][a-z0-9-]{2,40}$"
description:
title: Description
type: string
maxLength: 200
owner:
title: Owner Team
type: string
description: "Team responsible for this service"
ui:field: OwnerPicker
ui:options:
catalogFilter:
kind: Group
- title: Infrastructure
properties:
database:
title: Database
type: string
enum: [none, postgres, mysql]
default: none
cache:
title: Cache
type: string
enum: [none, redis]
default: none
cloud:
title: Cloud Region
type: string
enum: [us-east-1, eu-west-1, ap-southeast-1]
default: us-east-1
# --- Step 2: Actions to perform ---
steps:
# Fetch and render the template files
- id: fetch-template
name: Render service template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
description: ${{ parameters.description }}
owner: ${{ parameters.owner }}
database: ${{ parameters.database }}
cache: ${{ parameters.cache }}
cloud: ${{ parameters.cloud }}
# Create the GitHub repository
- id: create-repo
name: Create GitHub repository
action: publish:github
input:
repoUrl: github.com?owner=acme&repo=${{ parameters.name }}
description: ${{ parameters.description }}
defaultBranch: main
gitAuthorName: platform-bot
gitAuthorEmail: platform@acme.com
repoVisibility: private
topics:
- microservice
- golang
# Register in Backstage catalog
- id: register-catalog
name: Register in Software Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
# Provision infrastructure via Terraform (calls a platform API)
- id: provision-infra
name: Provision infrastructure
action: http:backstage:request
input:
method: POST
path: /api/platform/provision
body:
serviceName: ${{ parameters.name }}
database: ${{ parameters.database }}
cache: ${{ parameters.cache }}
cloud: ${{ parameters.cloud }}
owner: ${{ parameters.owner }}
# --- Step 3: Output links for the developer ---
output:
links:
- title: Repository
url: ${{ steps['create-repo'].output.remoteUrl }}
- title: Catalog Entry
icon: catalog
entityRef: ${{ steps['register-catalog'].output.entityRef }}
- title: CI/CD Pipeline
url: ${{ steps['create-repo'].output.remoteUrl }}/actions# templates/go-microservice/skeleton/catalog-info.yaml
# Template file rendered with values from the developer's input
# ${{ values.name }}, ${{ values.owner }}, etc. are Nunjucks expressions
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: ${{ values.name }}
description: ${{ values.description }}
annotations:
github.com/project-slug: acme/${{ values.name }}
backstage.io/techdocs-ref: dir:.
tags:
- golang
spec:
type: service
lifecycle: experimental
owner: group:${{ values.owner }}
system: platform
---
# templates/go-microservice/skeleton/.github/workflows/ci.yml
# Golden path CI: lint, test, build, scan, push
name: CI
on:
push:
branches: [main]
pull_request:
env:
IMAGE: ghcr.io/acme/${{ values.name }}
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version: "1.22" }
- run: go vet ./...
- uses: golangci/golangci-lint-action@v6
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with: { go-version: "1.22" }
- run: go test -race -coverprofile=coverage.out ./...
- uses: codecov/codecov-action@v4
build-and-push:
runs-on: ubuntu-latest
needs: [lint, test]
if: github.ref == 'refs/heads/main'
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ${{ env.IMAGE }}:${{ github.sha }},${{ env.IMAGE }}:latest
security-scan:
runs-on: ubuntu-latest
needs: build-and-push
steps:
- uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.IMAGE }}:${{ github.sha }}
exit-code: "1"
severity: CRITICAL,HIGHNote
Self-Service Infrastructure: Terraform and Crossplane
Infrastructure self-service means a developer can provision a PostgreSQL database, an S3 bucket, or a Kubernetes namespace without opening a ticket with the platform team. There are two dominant approaches: GitOps Terraform (developers submit PRs to a Terraform repository, Atlantis applies them) and Crossplane (Kubernetes-native IaC where developers create Kubernetes Custom Resources that the platform operator fulfills).
Atlantis makes Terraform collaborative and auditable: every PR gets a terraform plan comment, and atlantis apply runs after approval. Developers interact via GitHub; Atlantis holds the credentials and state. This is often the path of least resistance for teams already using Terraform. Crossplane is a better fit when the platform is already Kubernetes-native — developers create a RDSInstance manifest and the Crossplane AWS provider provisions the actual RDS database, with status surfaced back through the Kubernetes API.
# Crossplane: platform team defines Composite Resource Definitions (XRDs)
# Developers use simple Claim resources; the platform handles the details
# --- XRD: Platform team defines what a PostgresDatabase looks like ---
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
name: xpostgresdatabases.platform.acme.io
spec:
group: platform.acme.io
names:
kind: XPostgresDatabase
plural: xpostgresdatabases
claimNames:
kind: PostgresDatabase # What developers use
plural: postgresdatabases
versions:
- name: v1alpha1
served: true
referenceable: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required: [size, engine]
properties:
size:
type: string
enum: [small, medium, large] # Abstracts instance types
description: "small=db.t3.micro, medium=db.t3.medium, large=db.r6g.large"
engine:
type: string
enum: [postgres14, postgres15, postgres16]
multiAz:
type: boolean
default: false
backupRetentionDays:
type: integer
default: 7
minimum: 1
maximum: 35
---
# Composition: translates developer intent into AWS resources
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: postgres-aws
spec:
compositeTypeRef:
apiVersion: platform.acme.io/v1alpha1
kind: XPostgresDatabase
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
skipFinalSnapshot: false
storageEncrypted: true
autoMinorVersionUpgrade: true
deletionProtection: true
patches:
# Map developer's "size" to actual instance class
- type: CombineFromComposite
combine:
variables:
- fromFieldPath: spec.size
strategy: string
string:
fmt: |
%s
toFieldPath: spec.forProvider.instanceClass
transforms:
- type: map
map:
small: db.t3.micro
medium: db.t3.medium
large: db.r6g.large
- fromFieldPath: spec.backupRetentionDays
toFieldPath: spec.forProvider.backupRetentionPeriod
- fromFieldPath: spec.multiAz
toFieldPath: spec.forProvider.multiAz
---
# Developer Claim: what developers actually write
# This goes into the team's namespace, not platform namespace
apiVersion: platform.acme.io/v1alpha1
kind: PostgresDatabase
metadata:
name: orders-db
namespace: order-service
spec:
size: medium
engine: postgres16
multiAz: true
backupRetentionDays: 14
# Connection secret auto-created in the same namespace
writeConnectionSecretToRef:
name: orders-db-connection# Atlantis setup for GitOps Terraform self-service
# atlantis.yaml — repository-level configuration
version: 3
projects:
- name: databases
dir: modules/databases
workspace: production
autoplan:
when_modified: ["*.tf", "*.tfvars", "../modules/**/*.tf"]
apply_requirements:
- approved # At least one approval required
- mergeable # Branch must be up to date
- name: networking
dir: modules/networking
workspace: production
apply_requirements:
- approved
- mergeable
---
# modules/databases/rds/main.tf
# Platform-owned Terraform module; developers reference it
variable "service_name" {
type = string
description = "Name of the owning service (used for naming and tagging)"
}
variable "size" {
type = string
default = "small"
validation {
condition = contains(["small", "medium", "large"], var.size)
error_message = "size must be small, medium, or large"
}
}
variable "engine_version" {
type = string
default = "16.2"
}
locals {
instance_class_map = {
small = "db.t3.micro"
medium = "db.t3.medium"
large = "db.r6g.large"
}
}
resource "aws_db_instance" "this" {
identifier = "${var.service_name}-db"
engine = "postgres"
engine_version = var.engine_version
instance_class = local.instance_class_map[var.size]
allocated_storage = 20
storage_type = "gp3"
storage_encrypted = true
db_name = replace(var.service_name, "-", "_")
username = "admin"
password = random_password.master.result
backup_retention_period = 14
delete_automated_backups = false
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "${var.service_name}-db-final-${formatdate("YYYYMMDD", timestamp())}"
multi_az = var.size != "small"
tags = {
Service = var.service_name
ManagedBy = "terraform"
Environment = "production"
}
}
resource "random_password" "master" {
length = 32
special = true
}
# Store credentials in AWS Secrets Manager
resource "aws_secretsmanager_secret" "db_credentials" {
name = "${var.service_name}/db/credentials"
lifecycle {
prevent_destroy = true
}
}
resource "aws_secretsmanager_secret_version" "db_credentials" {
secret_id = aws_secretsmanager_secret.db_credentials.id
secret_string = jsonencode({
host = aws_db_instance.this.address
port = aws_db_instance.this.port
dbname = aws_db_instance.this.db_name
username = aws_db_instance.this.username
password = random_password.master.result
})
}Measuring Developer Experience: DORA and SPACE
Developer experience is not measurable by how many tickets the platform team closed. The industry has converged on two complementary frameworks: DORA metrics (deployment frequency, lead time, change failure rate, time to restore) measure delivery performance; the SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) captures dimensions that DORA misses — notably developer satisfaction and collaboration quality.
DORA metrics can be extracted from CI/CD systems automatically. The following example collects them from GitHub Actions and GitHub PR data, then exposes them as Prometheus metrics for Grafana dashboards.
# dora-collector/main.py
# Collects DORA metrics from GitHub and exposes them via Prometheus
# pip install PyGithub prometheus-client python-dateutil
import time
import datetime
from github import Github
from prometheus_client import start_http_server, Gauge, Histogram
DEPLOYMENT_FREQUENCY = Gauge(
"dora_deployment_frequency_per_day",
"Deployments to production per day (7d rolling average)",
["repo"],
)
LEAD_TIME_HOURS = Histogram(
"dora_lead_time_hours",
"Hours from first commit to production deployment",
["repo"],
buckets=[1, 4, 8, 24, 48, 72, 168, 336, float("inf")],
)
CHANGE_FAILURE_RATE = Gauge(
"dora_change_failure_rate",
"Fraction of deployments that caused a rollback or hotfix (30d window)",
["repo"],
)
MTTR_HOURS = Gauge(
"dora_mean_time_to_restore_hours",
"Mean hours to restore service after a production failure (30d window)",
["repo"],
)
def collect_deployment_frequency(repo, since: datetime.datetime) -> float:
"""Count successful production deployments in the last 7 days."""
deployments = repo.get_deployments(environment="production")
count = sum(
1
for d in deployments
if d.created_at > since and any(
s.state == "success" for s in d.get_statuses()
)
)
days = (datetime.datetime.utcnow() - since).days or 1
return count / days
def collect_lead_time(repo, since: datetime.datetime):
"""
Lead time = time from first commit in a PR to production deployment.
Approximated via PR merge time vs deployment time.
"""
pulls = repo.get_pulls(state="closed", base="main", sort="updated",
direction="desc")
for pr in pulls:
if pr.merged_at and pr.merged_at > since:
# Find the deployment that followed this merge
deployments = repo.get_deployments(
environment="production",
sha=pr.merge_commit_sha,
)
for dep in deployments:
statuses = list(dep.get_statuses())
success = next((s for s in statuses if s.state == "success"), None)
if success and pr.created_at:
lead_hours = (success.created_at - pr.created_at).total_seconds() / 3600
LEAD_TIME_HOURS.labels(repo=repo.name).observe(lead_hours)
def main():
import os
g = Github(os.environ["GITHUB_TOKEN"])
repos = os.environ["GITHUB_REPOS"].split(",") # "acme/order-service,acme/catalog"
start_http_server(8080)
print("DORA metrics server running on :8080/metrics")
while True:
seven_days_ago = datetime.datetime.utcnow() - datetime.timedelta(days=7)
for repo_name in repos:
repo = g.get_repo(repo_name.strip())
freq = collect_deployment_frequency(repo, seven_days_ago)
DEPLOYMENT_FREQUENCY.labels(repo=repo.name).set(freq)
collect_lead_time(repo, seven_days_ago)
time.sleep(300) # refresh every 5 minutes
if __name__ == "__main__":
main()# Grafana dashboard panel — DORA metrics overview
# Import this JSON into Grafana or provision via dashboard-as-code
{
"panels": [
{
"title": "Deployment Frequency (deployments/day, 7d avg)",
"type": "stat",
"targets": [
{
"expr": "avg(dora_deployment_frequency_per_day)",
"legendFormat": "Org average"
}
],
"thresholds": {
"steps": [
{ "value": null, "color": "red" }, // < 1/week: Low
{ "value": 0.14, "color": "yellow" }, // 1/week–1/day: Medium
{ "value": 1, "color": "green" } // > 1/day: High (Elite)
]
}
},
{
"title": "Lead Time Distribution (hours)",
"type": "histogram",
"targets": [
{
"expr": "histogram_quantile(0.50, rate(dora_lead_time_hours_bucket[30d]))",
"legendFormat": "p50"
},
{
"expr": "histogram_quantile(0.95, rate(dora_lead_time_hours_bucket[30d]))",
"legendFormat": "p95"
}
]
}
]
}CI/CD as a Platform Service: Reusable Workflows
Rather than each team maintaining its own CI configuration from scratch, the platform team publishes reusable GitHub Actions workflows from a central repository. Teams call these workflows with a single uses: line; the platform team upgrades the shared workflow (adding a new SAST scanner, bumping a build tool version) and the change propagates across all services automatically. This is the CI/CD equivalent of a golden path.
# .github/workflows/go-service-ci.yml
# Platform-owned reusable workflow for Go microservices
# Stored in: github.com/acme/platform-workflows/.github/workflows/go-service-ci.yml
name: Go Service CI (Reusable)
on:
workflow_call:
inputs:
go-version:
type: string
default: "1.22"
image-name:
type: string
required: true
run-integration-tests:
type: boolean
default: false
secrets:
REGISTRY_TOKEN:
required: true
SONAR_TOKEN:
required: false
outputs:
image-digest:
description: "SHA digest of the pushed image"
value: ${{ jobs.build.outputs.digest }}
jobs:
lint:
name: Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: ${{ inputs.go-version }}
cache: true
- uses: golangci/golangci-lint-action@v6
with:
version: v1.57
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: ${{ inputs.go-version }}
cache: true
- name: Run unit tests
run: go test -race -coverprofile=coverage.out ./...
- name: Integration tests
if: ${{ inputs.run-integration-tests }}
run: go test -tags integration ./...
- uses: codecov/codecov-action@v4
sast:
name: SAST + Secrets Scan
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # full history for gitleaks
- name: Run Gitleaks (secrets scan)
uses: gitleaks/gitleaks-action@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Run Semgrep (SAST)
uses: semgrep/semgrep-action@v1
with:
config: "p/golang p/owasp-top-ten"
build:
name: Build & Push Image
runs-on: ubuntu-latest
needs: [lint, test, sast]
if: github.ref == 'refs/heads/main'
outputs:
digest: ${{ steps.push.outputs.digest }}
steps:
- uses: actions/checkout@v4
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.REGISTRY_TOKEN }}
- id: push
uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/acme/${{ inputs.image-name }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
trivy:
name: Container Scan
runs-on: ubuntu-latest
needs: build
steps:
- uses: aquasecurity/trivy-action@master
with:
image-ref: ghcr.io/acme/${{ inputs.image-name }}:${{ github.sha }}
exit-code: "1"
severity: CRITICAL,HIGH
ignore-unfixed: true# Service team's .github/workflows/ci.yml
# Teams call the platform's reusable workflow with one line
name: CI
on:
push:
branches: [main]
pull_request:
jobs:
ci:
uses: acme/platform-workflows/.github/workflows/go-service-ci.yml@main
with:
go-version: "1.22"
image-name: order-service
run-integration-tests: true
secrets:
REGISTRY_TOKEN: ${{ secrets.GITHUB_TOKEN }}
deploy:
name: Deploy to Production
runs-on: ubuntu-latest
needs: ci
if: github.ref == 'refs/heads/main'
steps:
# GitOps deploy: update image digest in ArgoCD app repo
- name: Update image tag
uses: actions/github-script@v7
with:
github-token: ${{ secrets.GITOPS_TOKEN }}
script: |
const { data } = await github.rest.repos.getContent({
owner: 'acme',
repo: 'gitops-config',
path: 'apps/order-service/values.yaml',
});
const content = Buffer.from(data.content, 'base64').toString();
const updated = content.replace(
/tag: .*/,
`tag: "${{ github.sha }}"`
);
await github.rest.repos.createOrUpdateFileContents({
owner: 'acme',
repo: 'gitops-config',
path: 'apps/order-service/values.yaml',
message: `chore: deploy order-service@${{ github.sha }}`,
content: Buffer.from(updated).toString('base64'),
sha: data.sha,
});Developer Experience Surveys and Feedback Loops
Quantitative metrics tell you what changed; surveys tell you why and what to build next. The McKinsey Developer Velocity Index and the GitHub Good Day Project both find that developer satisfaction and perceived productivity are highly correlated with tool quality and process clarity — not just deployment frequency. Run a quarterly developer experience survey using the SPACE framework dimensions as question categories. Publish results transparently, and let the survey results drive the platform team's roadmap.
Note
IDP Production Checklist
Software Catalog is the source of truth
Every production service has a catalog-info.yaml with accurate ownership, SLO links, and on-call contacts. Stale catalog entries erode trust and kill adoption — automate staleness detection.
Golden paths cover the 80% case
Templates should handle the most common service archetypes (web API, worker, data pipeline). Don't try to template every edge case — make escaping the golden path explicit and documented, not impossible.
Infrastructure self-service is approved asynchronously
Developers open a PR (Terraform) or apply a manifest (Crossplane); the platform applies after review or automatically for low-risk resources. No Slack DMs, no tickets, no waiting for a human to provision a database.
DORA metrics are tracked per team, not just org-wide
Org-wide averages hide struggling teams. Track DORA metrics per service owner group — make the data visible in Backstage so teams can self-diagnose. Don't use them for performance reviews.
Reusable workflows are versioned and pinned
Teams reference platform workflows by tag (v2.3.0), not by @main. Breaking changes to the platform CI increment the major version; teams upgrade on their own schedule.
Platform team runs oncall for the platform, not for services
When a platform capability (the catalog, the scaffolder, Atlantis) breaks, the platform team is paged. When a service breaks, the service team is paged. Clear escalation paths prevent the platform team from becoming a catch-all support desk.
Backstage is behind auth — not open to the internet
Backstage's admin APIs and plugin backends can be sensitive. Use your IdP (Okta, GitHub SSO) for authentication and authorise catalog mutations to service owners only.
Developer experience is measured quarterly
Run SPACE-based surveys quarterly. Publish results and a public response from the platform team — what you're doing about the top 3 pain points. This builds trust and drives adoption.
Work with us
Building an Internal Developer Platform or improving developer experience across your engineering organisation?
We design and implement Internal Developer Platforms — from Software Catalog setup in Backstage and golden-path scaffolding templates to self-service Terraform/Crossplane infrastructure, reusable CI/CD workflows, and DORA metric dashboards. Let’s talk.
Get in touch