Back to Blog
Platform EngineeringDeveloper ExperienceIDPBackstageGolden PathsDevOps

Platform Engineering and Developer Experience — IDP Design, Golden Paths, and Self-Service

A practical guide to platform engineering and developer experience: designing an Internal Developer Platform (IDP) with Backstage and Port, building golden path software templates, self-service infrastructure with Terraform/Atlantis and Crossplane, measuring DevEx with DORA and SPACE metrics, and delivering CI/CD as a reusable platform service.

2026-05-07

Why Platform Engineering Exists

Every engineering organisation eventually hits the same ceiling. The infrastructure grows complex, oncall rotations get overloaded, and developer teams spend an increasing fraction of their time on undifferentiated heavy lifting — provisioning databases, configuring CI pipelines, debugging Kubernetes YAML, managing secrets. The "you build it, you run it" mandate delivered ownership but also fragmentation: a hundred teams reinventing the same patterns, each slightly differently, each accumulating its own toil.

Platform engineering is the discipline of building the internal product that eliminates that toil. The platform team operates like a product team — with customers who are developers, a roadmap driven by developer pain, and success measured by adoption and time-to-production, not by ticket volume. The output is an Internal Developer Platform (IDP): a curated set of self-service capabilities that let teams provision infrastructure, deploy services, and operate systems without needing to become infrastructure experts.

Cognitive Load Reduction

The primary value of a good IDP is not automation — it is reducing the cognitive load on development teams. Developers should understand their service's SLOs, not the intricacies of Kubernetes PodDisruptionBudgets or Terraform provider version pinning.

Paved Roads, Not Guardrails

The platform provides golden paths — opinionated, well-lit routes to production — not rigid walls. Teams can deviate from golden paths when they need to, but deviating should feel like leaving a paved road: possible, but noticeably harder.

Platform as Internal Product

Platform teams that treat their work as infrastructure projects fail. Successful platform teams have a product mindset: they talk to users (developers), measure adoption, prioritise based on impact, and ship incrementally — treating developer experience as seriously as user experience.

IDP Architecture: Layers and Capabilities

An IDP is not a single tool. It is a coherent set of capabilities spread across multiple layers. The Humanitec Platform Orchestrator model describes five layers: developer portal, deployment automation, dynamic configuration management, infrastructure orchestration, and monitoring. In practice, most organisations build these layers by composing open-source and managed tools rather than building from scratch.

# A typical IDP technology stack
#
# Layer 1 — Developer Portal
#   Backstage (open-source, Spotify)   https://backstage.io
#   Port                               https://getport.io
#   Cortex                             https://cortex.io
#
# Layer 2 — Service Templates / Scaffolding
#   Backstage Software Templates       cookiecutter-style with nunjucks
#   Cookiecutter                       https://cookiecutter.readthedocs.io
#   Copier                             https://copier.readthedocs.io
#
# Layer 3 — Infrastructure Self-Service
#   Terraform + Atlantis               GitOps-driven apply
#   Crossplane                         Kubernetes-native IaC
#   AWS Service Catalog / GCP Config   managed service catalogs
#
# Layer 4 — CI/CD as Platform Service
#   GitHub Actions reusable workflows  .github/workflows/*.yml
#   Tekton (Kubernetes-native CI)      https://tekton.dev
#   ArgoCD (GitOps CD)                 https://argoproj.github.io/cd
#
# Layer 5 — Observability Platform
#   Grafana + Prometheus + Loki        open-source LGTM stack
#   Datadog / New Relic                managed alternatives
#   OpenTelemetry Collector            telemetry pipeline
#
# Glue layer — Platform API / Orchestrator
#   Humanitec Platform Orchestrator    managed
#   Kratix                             open-source promise-based
#   Score                              workload spec abstraction

Note

The most common mistake in IDP design is starting with tooling instead of capabilities. Before evaluating Backstage vs Port vs Cortex, list the top 10 developer pain points — ideally from a survey or from analysing Slack/Jira patterns. Build the IDP to address those specific pains. Tools should serve the capability, not define it.

Building the Developer Portal with Backstage

Backstage is the open-source framework most organisations choose as their developer portal. It provides a Software Catalog (service registry), TechDocs (documentation-as-code), Software Templates (scaffolding), and a plugin ecosystem covering Kubernetes, CI/CD, cost, and more. The catalog is the core: every service, API, data pipeline, and library lives there, with ownership, SLO status, and links to runbooks.

# Bootstrap a new Backstage app
npx @backstage/create-app@latest --path my-backstage

# Project structure
my-backstage/
├── packages/
│   ├── app/                  # Frontend React application
│   │   └── src/
│   │       ├── App.tsx       # Plugin registration
│   │       └── components/   # Custom UI overrides
│   └── backend/              # Node.js backend
│       └── src/
│           ├── index.ts      # Backend plugin registration
│           └── plugins/      # Custom backend plugins
├── app-config.yaml           # Main configuration
├── app-config.production.yaml # Production overrides
└── catalog-info.yaml         # Root catalog entry
# app-config.yaml — Backstage configuration

app:
  title: ACME Developer Portal
  baseUrl: https://backstage.acme.internal

backend:
  baseUrl: https://backstage.acme.internal
  database:
    client: pg
    connection:
      host: ${POSTGRES_HOST}
      port: ${POSTGRES_PORT}
      user: ${POSTGRES_USER}
      password: ${POSTGRES_PASSWORD}
      database: backstage

# --- Software Catalog ---
catalog:
  # Rules: only allow specific entity kinds from specific locations
  rules:
    - allow: [Component, API, Resource, Location, Template, Group, User, System, Domain]

  # Locations: where to discover catalog entities
  locations:
    # Org structure (teams, users)
    - type: url
      target: https://github.com/acme/backstage-catalog/blob/main/org.yaml
      rules:
        - allow: [Group, User]

    # All service catalog-info.yaml files across the org
    - type: github-discovery
      target: https://github.com/acme/*/blob/main/catalog-info.yaml

    # Software Templates for scaffolding
    - type: url
      target: https://github.com/acme/backstage-catalog/blob/main/templates/all-templates.yaml
      rules:
        - allow: [Template]

# --- GitHub integration ---
integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

# --- Auth (GitHub OAuth) ---
auth:
  environment: production
  providers:
    github:
      production:
        clientId: ${GITHUB_CLIENT_ID}
        clientSecret: ${GITHUB_CLIENT_SECRET}

# --- Kubernetes plugin ---
kubernetes:
  serviceLocatorMethod:
    type: multiTenant
  clusterLocatorMethods:
    - type: config
      clusters:
        - url: https://k8s-prod.acme.internal
          name: production
          authProvider: serviceAccount
          serviceAccountToken: ${K8S_SA_TOKEN}
          skipTLSVerify: false
          caData: ${K8S_CA_DATA}

# --- TechDocs ---
techdocs:
  builder: external           # docs built in CI, not by Backstage
  generator:
    runIn: docker
  publisher:
    type: googleGcs
    googleGcs:
      bucketName: acme-techdocs
      credentials: ${GOOGLE_APPLICATION_CREDENTIALS}
# catalog-info.yaml — a microservice entry in the Software Catalog
# Place this file in the root of every service repository

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: order-service
  title: Order Service
  description: Manages order creation, payment processing, and fulfillment tracking
  annotations:
    # GitHub Actions CI link
    github.com/project-slug: acme/order-service
    # ArgoCD app link
    argocd/app-name: order-service-production
    # Kubernetes namespace
    backstage.io/kubernetes-namespace: orders
    # PagerDuty service ID
    pagerduty.com/service-id: P1234AB
    # TechDocs site
    backstage.io/techdocs-ref: dir:.
  tags:
    - orders
    - payments
    - golang
  links:
    - url: https://grafana.acme.internal/d/order-service
      title: Grafana Dashboard
      icon: dashboard
    - url: https://acme.pagerduty.com/service-directory/P1234AB
      title: PagerDuty
      icon: alert
spec:
  type: service
  lifecycle: production
  owner: group:order-team
  system: commerce-platform
  # APIs this service provides
  providesApis:
    - order-api
  # APIs this service consumes
  consumesApis:
    - payment-api
    - catalog-api
  # Infrastructure resources it depends on
  dependsOn:
    - resource:orders-postgres
    - resource:orders-redis

Golden Paths: Software Templates and Scaffolding

Golden paths are opinionated, pre-approved routes from idea to production. They encode your organisation's best practices — language choice, testing framework, CI configuration, observability instrumentation, security scanning — into a repeatable template that developers can invoke in minutes. Backstage's Software Templates are the primary mechanism for delivering golden paths: a YAML template definition drives a wizard UI, collects developer input, and triggers a series of actions — creating a repository, rendering files, opening a PR, registering the service in the catalog.

# templates/go-microservice/template.yaml
# Backstage Software Template — creates a new Go microservice

apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: go-microservice
  title: Go Microservice
  description: Production-ready Go service with CI/CD, observability, and catalog registration
  tags:
    - golang
    - microservice
    - recommended
spec:
  owner: group:platform-team
  type: service

  # --- Step 1: Collect input from the developer ---
  parameters:
    - title: Service Details
      required: [name, description, owner]
      properties:
        name:
          title: Service Name
          type: string
          description: "Lowercase, hyphen-separated (e.g. order-service)"
          pattern: "^[a-z][a-z0-9-]{2,40}$"
        description:
          title: Description
          type: string
          maxLength: 200
        owner:
          title: Owner Team
          type: string
          description: "Team responsible for this service"
          ui:field: OwnerPicker
          ui:options:
            catalogFilter:
              kind: Group

    - title: Infrastructure
      properties:
        database:
          title: Database
          type: string
          enum: [none, postgres, mysql]
          default: none
        cache:
          title: Cache
          type: string
          enum: [none, redis]
          default: none
        cloud:
          title: Cloud Region
          type: string
          enum: [us-east-1, eu-west-1, ap-southeast-1]
          default: us-east-1

  # --- Step 2: Actions to perform ---
  steps:
    # Fetch and render the template files
    - id: fetch-template
      name: Render service template
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          description: ${{ parameters.description }}
          owner: ${{ parameters.owner }}
          database: ${{ parameters.database }}
          cache: ${{ parameters.cache }}
          cloud: ${{ parameters.cloud }}

    # Create the GitHub repository
    - id: create-repo
      name: Create GitHub repository
      action: publish:github
      input:
        repoUrl: github.com?owner=acme&repo=${{ parameters.name }}
        description: ${{ parameters.description }}
        defaultBranch: main
        gitAuthorName: platform-bot
        gitAuthorEmail: platform@acme.com
        repoVisibility: private
        topics:
          - microservice
          - golang

    # Register in Backstage catalog
    - id: register-catalog
      name: Register in Software Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps['create-repo'].output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml

    # Provision infrastructure via Terraform (calls a platform API)
    - id: provision-infra
      name: Provision infrastructure
      action: http:backstage:request
      input:
        method: POST
        path: /api/platform/provision
        body:
          serviceName: ${{ parameters.name }}
          database: ${{ parameters.database }}
          cache: ${{ parameters.cache }}
          cloud: ${{ parameters.cloud }}
          owner: ${{ parameters.owner }}

  # --- Step 3: Output links for the developer ---
  output:
    links:
      - title: Repository
        url: ${{ steps['create-repo'].output.remoteUrl }}
      - title: Catalog Entry
        icon: catalog
        entityRef: ${{ steps['register-catalog'].output.entityRef }}
      - title: CI/CD Pipeline
        url: ${{ steps['create-repo'].output.remoteUrl }}/actions
# templates/go-microservice/skeleton/catalog-info.yaml
# Template file rendered with values from the developer's input
# ${{ values.name }}, ${{ values.owner }}, etc. are Nunjucks expressions

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: ${{ values.name }}
  description: ${{ values.description }}
  annotations:
    github.com/project-slug: acme/${{ values.name }}
    backstage.io/techdocs-ref: dir:.
  tags:
    - golang
spec:
  type: service
  lifecycle: experimental
  owner: group:${{ values.owner }}
  system: platform

---
# templates/go-microservice/skeleton/.github/workflows/ci.yml
# Golden path CI: lint, test, build, scan, push

name: CI

on:
  push:
    branches: [main]
  pull_request:

env:
  IMAGE: ghcr.io/acme/${{ values.name }}

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with: { go-version: "1.22" }
      - run: go vet ./...
      - uses: golangci/golangci-lint-action@v6

  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with: { go-version: "1.22" }
      - run: go test -race -coverprofile=coverage.out ./...
      - uses: codecov/codecov-action@v4

  build-and-push:
    runs-on: ubuntu-latest
    needs: [lint, test]
    if: github.ref == 'refs/heads/main'
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          push: true
          tags: ${{ env.IMAGE }}:${{ github.sha }},${{ env.IMAGE }}:latest

  security-scan:
    runs-on: ubuntu-latest
    needs: build-and-push
    steps:
      - uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ env.IMAGE }}:${{ github.sha }}
          exit-code: "1"
          severity: CRITICAL,HIGH

Note

Golden path templates should encode security as a default, not an afterthought. Every template at minimum should include: static analysis (golangci-lint, ESLint, Ruff), dependency vulnerability scanning (Trivy, Snyk), secrets scanning (Gitleaks, truffleHog), and SBOM generation. Security teams get consistent enforcement across every new service; developers get it for free.

Self-Service Infrastructure: Terraform and Crossplane

Infrastructure self-service means a developer can provision a PostgreSQL database, an S3 bucket, or a Kubernetes namespace without opening a ticket with the platform team. There are two dominant approaches: GitOps Terraform (developers submit PRs to a Terraform repository, Atlantis applies them) and Crossplane (Kubernetes-native IaC where developers create Kubernetes Custom Resources that the platform operator fulfills).

Atlantis makes Terraform collaborative and auditable: every PR gets a terraform plan comment, and atlantis apply runs after approval. Developers interact via GitHub; Atlantis holds the credentials and state. This is often the path of least resistance for teams already using Terraform. Crossplane is a better fit when the platform is already Kubernetes-native — developers create a RDSInstance manifest and the Crossplane AWS provider provisions the actual RDS database, with status surfaced back through the Kubernetes API.

# Crossplane: platform team defines Composite Resource Definitions (XRDs)
# Developers use simple Claim resources; the platform handles the details

# --- XRD: Platform team defines what a PostgresDatabase looks like ---
apiVersion: apiextensions.crossplane.io/v1
kind: CompositeResourceDefinition
metadata:
  name: xpostgresdatabases.platform.acme.io
spec:
  group: platform.acme.io
  names:
    kind: XPostgresDatabase
    plural: xpostgresdatabases
  claimNames:
    kind: PostgresDatabase          # What developers use
    plural: postgresdatabases
  versions:
    - name: v1alpha1
      served: true
      referenceable: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              required: [size, engine]
              properties:
                size:
                  type: string
                  enum: [small, medium, large]    # Abstracts instance types
                  description: "small=db.t3.micro, medium=db.t3.medium, large=db.r6g.large"
                engine:
                  type: string
                  enum: [postgres14, postgres15, postgres16]
                multiAz:
                  type: boolean
                  default: false
                backupRetentionDays:
                  type: integer
                  default: 7
                  minimum: 1
                  maximum: 35

---
# Composition: translates developer intent into AWS resources
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: postgres-aws
spec:
  compositeTypeRef:
    apiVersion: platform.acme.io/v1alpha1
    kind: XPostgresDatabase
  resources:
    - name: rds-instance
      base:
        apiVersion: rds.aws.upbound.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            region: us-east-1
            skipFinalSnapshot: false
            storageEncrypted: true
            autoMinorVersionUpgrade: true
            deletionProtection: true
      patches:
        # Map developer's "size" to actual instance class
        - type: CombineFromComposite
          combine:
            variables:
              - fromFieldPath: spec.size
            strategy: string
            string:
              fmt: |
                %s
          toFieldPath: spec.forProvider.instanceClass
          transforms:
            - type: map
              map:
                small: db.t3.micro
                medium: db.t3.medium
                large: db.r6g.large
        - fromFieldPath: spec.backupRetentionDays
          toFieldPath: spec.forProvider.backupRetentionPeriod
        - fromFieldPath: spec.multiAz
          toFieldPath: spec.forProvider.multiAz

---
# Developer Claim: what developers actually write
# This goes into the team's namespace, not platform namespace
apiVersion: platform.acme.io/v1alpha1
kind: PostgresDatabase
metadata:
  name: orders-db
  namespace: order-service
spec:
  size: medium
  engine: postgres16
  multiAz: true
  backupRetentionDays: 14
  # Connection secret auto-created in the same namespace
  writeConnectionSecretToRef:
    name: orders-db-connection
# Atlantis setup for GitOps Terraform self-service

# atlantis.yaml — repository-level configuration
version: 3
projects:
  - name: databases
    dir: modules/databases
    workspace: production
    autoplan:
      when_modified: ["*.tf", "*.tfvars", "../modules/**/*.tf"]
    apply_requirements:
      - approved          # At least one approval required
      - mergeable         # Branch must be up to date

  - name: networking
    dir: modules/networking
    workspace: production
    apply_requirements:
      - approved
      - mergeable

---
# modules/databases/rds/main.tf
# Platform-owned Terraform module; developers reference it

variable "service_name" {
  type        = string
  description = "Name of the owning service (used for naming and tagging)"
}

variable "size" {
  type    = string
  default = "small"
  validation {
    condition     = contains(["small", "medium", "large"], var.size)
    error_message = "size must be small, medium, or large"
  }
}

variable "engine_version" {
  type    = string
  default = "16.2"
}

locals {
  instance_class_map = {
    small  = "db.t3.micro"
    medium = "db.t3.medium"
    large  = "db.r6g.large"
  }
}

resource "aws_db_instance" "this" {
  identifier        = "${var.service_name}-db"
  engine            = "postgres"
  engine_version    = var.engine_version
  instance_class    = local.instance_class_map[var.size]
  allocated_storage = 20
  storage_type      = "gp3"
  storage_encrypted = true

  db_name  = replace(var.service_name, "-", "_")
  username = "admin"
  password = random_password.master.result

  backup_retention_period   = 14
  delete_automated_backups  = false
  deletion_protection       = true
  skip_final_snapshot       = false
  final_snapshot_identifier = "${var.service_name}-db-final-${formatdate("YYYYMMDD", timestamp())}"

  multi_az = var.size != "small"

  tags = {
    Service     = var.service_name
    ManagedBy   = "terraform"
    Environment = "production"
  }
}

resource "random_password" "master" {
  length  = 32
  special = true
}

# Store credentials in AWS Secrets Manager
resource "aws_secretsmanager_secret" "db_credentials" {
  name = "${var.service_name}/db/credentials"

  lifecycle {
    prevent_destroy = true
  }
}

resource "aws_secretsmanager_secret_version" "db_credentials" {
  secret_id = aws_secretsmanager_secret.db_credentials.id
  secret_string = jsonencode({
    host     = aws_db_instance.this.address
    port     = aws_db_instance.this.port
    dbname   = aws_db_instance.this.db_name
    username = aws_db_instance.this.username
    password = random_password.master.result
  })
}

Measuring Developer Experience: DORA and SPACE

Developer experience is not measurable by how many tickets the platform team closed. The industry has converged on two complementary frameworks: DORA metrics (deployment frequency, lead time, change failure rate, time to restore) measure delivery performance; the SPACE framework (Satisfaction, Performance, Activity, Communication, Efficiency) captures dimensions that DORA misses — notably developer satisfaction and collaboration quality.

DORA metrics can be extracted from CI/CD systems automatically. The following example collects them from GitHub Actions and GitHub PR data, then exposes them as Prometheus metrics for Grafana dashboards.

# dora-collector/main.py
# Collects DORA metrics from GitHub and exposes them via Prometheus
# pip install PyGithub prometheus-client python-dateutil

import time
import datetime
from github import Github
from prometheus_client import start_http_server, Gauge, Histogram

DEPLOYMENT_FREQUENCY = Gauge(
    "dora_deployment_frequency_per_day",
    "Deployments to production per day (7d rolling average)",
    ["repo"],
)
LEAD_TIME_HOURS = Histogram(
    "dora_lead_time_hours",
    "Hours from first commit to production deployment",
    ["repo"],
    buckets=[1, 4, 8, 24, 48, 72, 168, 336, float("inf")],
)
CHANGE_FAILURE_RATE = Gauge(
    "dora_change_failure_rate",
    "Fraction of deployments that caused a rollback or hotfix (30d window)",
    ["repo"],
)
MTTR_HOURS = Gauge(
    "dora_mean_time_to_restore_hours",
    "Mean hours to restore service after a production failure (30d window)",
    ["repo"],
)

def collect_deployment_frequency(repo, since: datetime.datetime) -> float:
    """Count successful production deployments in the last 7 days."""
    deployments = repo.get_deployments(environment="production")
    count = sum(
        1
        for d in deployments
        if d.created_at > since and any(
            s.state == "success" for s in d.get_statuses()
        )
    )
    days = (datetime.datetime.utcnow() - since).days or 1
    return count / days

def collect_lead_time(repo, since: datetime.datetime):
    """
    Lead time = time from first commit in a PR to production deployment.
    Approximated via PR merge time vs deployment time.
    """
    pulls = repo.get_pulls(state="closed", base="main", sort="updated",
                           direction="desc")
    for pr in pulls:
        if pr.merged_at and pr.merged_at > since:
            # Find the deployment that followed this merge
            deployments = repo.get_deployments(
                environment="production",
                sha=pr.merge_commit_sha,
            )
            for dep in deployments:
                statuses = list(dep.get_statuses())
                success = next((s for s in statuses if s.state == "success"), None)
                if success and pr.created_at:
                    lead_hours = (success.created_at - pr.created_at).total_seconds() / 3600
                    LEAD_TIME_HOURS.labels(repo=repo.name).observe(lead_hours)

def main():
    import os
    g = Github(os.environ["GITHUB_TOKEN"])
    repos = os.environ["GITHUB_REPOS"].split(",")   # "acme/order-service,acme/catalog"

    start_http_server(8080)
    print("DORA metrics server running on :8080/metrics")

    while True:
        seven_days_ago = datetime.datetime.utcnow() - datetime.timedelta(days=7)
        for repo_name in repos:
            repo = g.get_repo(repo_name.strip())
            freq = collect_deployment_frequency(repo, seven_days_ago)
            DEPLOYMENT_FREQUENCY.labels(repo=repo.name).set(freq)
            collect_lead_time(repo, seven_days_ago)

        time.sleep(300)   # refresh every 5 minutes

if __name__ == "__main__":
    main()
# Grafana dashboard panel — DORA metrics overview
# Import this JSON into Grafana or provision via dashboard-as-code

{
  "panels": [
    {
      "title": "Deployment Frequency (deployments/day, 7d avg)",
      "type": "stat",
      "targets": [
        {
          "expr": "avg(dora_deployment_frequency_per_day)",
          "legendFormat": "Org average"
        }
      ],
      "thresholds": {
        "steps": [
          { "value": null,  "color": "red"    },  // < 1/week: Low
          { "value": 0.14,  "color": "yellow" },  // 1/week–1/day: Medium
          { "value": 1,     "color": "green"  }   // > 1/day: High (Elite)
        ]
      }
    },
    {
      "title": "Lead Time Distribution (hours)",
      "type": "histogram",
      "targets": [
        {
          "expr": "histogram_quantile(0.50, rate(dora_lead_time_hours_bucket[30d]))",
          "legendFormat": "p50"
        },
        {
          "expr": "histogram_quantile(0.95, rate(dora_lead_time_hours_bucket[30d]))",
          "legendFormat": "p95"
        }
      ]
    }
  ]
}

CI/CD as a Platform Service: Reusable Workflows

Rather than each team maintaining its own CI configuration from scratch, the platform team publishes reusable GitHub Actions workflows from a central repository. Teams call these workflows with a single uses: line; the platform team upgrades the shared workflow (adding a new SAST scanner, bumping a build tool version) and the change propagates across all services automatically. This is the CI/CD equivalent of a golden path.

# .github/workflows/go-service-ci.yml
# Platform-owned reusable workflow for Go microservices
# Stored in: github.com/acme/platform-workflows/.github/workflows/go-service-ci.yml

name: Go Service CI (Reusable)

on:
  workflow_call:
    inputs:
      go-version:
        type: string
        default: "1.22"
      image-name:
        type: string
        required: true
      run-integration-tests:
        type: boolean
        default: false
    secrets:
      REGISTRY_TOKEN:
        required: true
      SONAR_TOKEN:
        required: false
    outputs:
      image-digest:
        description: "SHA digest of the pushed image"
        value: ${{ jobs.build.outputs.digest }}

jobs:
  lint:
    name: Lint
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ inputs.go-version }}
          cache: true
      - uses: golangci/golangci-lint-action@v6
        with:
          version: v1.57

  test:
    name: Test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-go@v5
        with:
          go-version: ${{ inputs.go-version }}
          cache: true
      - name: Run unit tests
        run: go test -race -coverprofile=coverage.out ./...
      - name: Integration tests
        if: ${{ inputs.run-integration-tests }}
        run: go test -tags integration ./...
      - uses: codecov/codecov-action@v4

  sast:
    name: SAST + Secrets Scan
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0    # full history for gitleaks
      - name: Run Gitleaks (secrets scan)
        uses: gitleaks/gitleaks-action@v2
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
      - name: Run Semgrep (SAST)
        uses: semgrep/semgrep-action@v1
        with:
          config: "p/golang p/owasp-top-ten"

  build:
    name: Build & Push Image
    runs-on: ubuntu-latest
    needs: [lint, test, sast]
    if: github.ref == 'refs/heads/main'
    outputs:
      digest: ${{ steps.push.outputs.digest }}
    steps:
      - uses: actions/checkout@v4
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.REGISTRY_TOKEN }}
      - id: push
        uses: docker/build-push-action@v5
        with:
          push: true
          tags: ghcr.io/acme/${{ inputs.image-name }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  trivy:
    name: Container Scan
    runs-on: ubuntu-latest
    needs: build
    steps:
      - uses: aquasecurity/trivy-action@master
        with:
          image-ref: ghcr.io/acme/${{ inputs.image-name }}:${{ github.sha }}
          exit-code: "1"
          severity: CRITICAL,HIGH
          ignore-unfixed: true
# Service team's .github/workflows/ci.yml
# Teams call the platform's reusable workflow with one line

name: CI

on:
  push:
    branches: [main]
  pull_request:

jobs:
  ci:
    uses: acme/platform-workflows/.github/workflows/go-service-ci.yml@main
    with:
      go-version: "1.22"
      image-name: order-service
      run-integration-tests: true
    secrets:
      REGISTRY_TOKEN: ${{ secrets.GITHUB_TOKEN }}

  deploy:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: ci
    if: github.ref == 'refs/heads/main'
    steps:
      # GitOps deploy: update image digest in ArgoCD app repo
      - name: Update image tag
        uses: actions/github-script@v7
        with:
          github-token: ${{ secrets.GITOPS_TOKEN }}
          script: |
            const { data } = await github.rest.repos.getContent({
              owner: 'acme',
              repo: 'gitops-config',
              path: 'apps/order-service/values.yaml',
            });
            const content = Buffer.from(data.content, 'base64').toString();
            const updated = content.replace(
              /tag: .*/,
              `tag: "${{ github.sha }}"`
            );
            await github.rest.repos.createOrUpdateFileContents({
              owner: 'acme',
              repo: 'gitops-config',
              path: 'apps/order-service/values.yaml',
              message: `chore: deploy order-service@${{ github.sha }}`,
              content: Buffer.from(updated).toString('base64'),
              sha: data.sha,
            });

Developer Experience Surveys and Feedback Loops

Quantitative metrics tell you what changed; surveys tell you why and what to build next. The McKinsey Developer Velocity Index and the GitHub Good Day Project both find that developer satisfaction and perceived productivity are highly correlated with tool quality and process clarity — not just deployment frequency. Run a quarterly developer experience survey using the SPACE framework dimensions as question categories. Publish results transparently, and let the survey results drive the platform team's roadmap.

Note

The single most valuable DevEx survey question: "How many hours per week do you spend on tasks that are not directly related to building your product?" This operationalises cognitive load and toil in terms developers understand immediately. Benchmark it quarterly and use it as the headline metric for the platform team's OKR. When the number drops, the platform is working.

IDP Production Checklist

Software Catalog is the source of truth

Every production service has a catalog-info.yaml with accurate ownership, SLO links, and on-call contacts. Stale catalog entries erode trust and kill adoption — automate staleness detection.

Golden paths cover the 80% case

Templates should handle the most common service archetypes (web API, worker, data pipeline). Don't try to template every edge case — make escaping the golden path explicit and documented, not impossible.

Infrastructure self-service is approved asynchronously

Developers open a PR (Terraform) or apply a manifest (Crossplane); the platform applies after review or automatically for low-risk resources. No Slack DMs, no tickets, no waiting for a human to provision a database.

DORA metrics are tracked per team, not just org-wide

Org-wide averages hide struggling teams. Track DORA metrics per service owner group — make the data visible in Backstage so teams can self-diagnose. Don't use them for performance reviews.

Reusable workflows are versioned and pinned

Teams reference platform workflows by tag (v2.3.0), not by @main. Breaking changes to the platform CI increment the major version; teams upgrade on their own schedule.

Platform team runs oncall for the platform, not for services

When a platform capability (the catalog, the scaffolder, Atlantis) breaks, the platform team is paged. When a service breaks, the service team is paged. Clear escalation paths prevent the platform team from becoming a catch-all support desk.

Backstage is behind auth — not open to the internet

Backstage's admin APIs and plugin backends can be sensitive. Use your IdP (Okta, GitHub SSO) for authentication and authorise catalog mutations to service owners only.

Developer experience is measured quarterly

Run SPACE-based surveys quarterly. Publish results and a public response from the platform team — what you're doing about the top 3 pain points. This builds trust and drives adoption.

Work with us

Building an Internal Developer Platform or improving developer experience across your engineering organisation?

We design and implement Internal Developer Platforms — from Software Catalog setup in Backstage and golden-path scaffolding templates to self-service Terraform/Crossplane infrastructure, reusable CI/CD workflows, and DORA metric dashboards. Let’s talk.

Get in touch

Related Articles