What Are Data Contracts and Why OpenAPI?
A data contract is a machine-readable agreement between a data producer and its consumers that specifies the schema, semantics, SLAs, and ownership of a data asset. Without contracts, downstream teams discover breaking changes at runtime — a renamed field in a JSON response silently corrupts a dashboard, a removed endpoint breaks a nightly pipeline, or a type change from string to integer surfaces only in production errors at 3 AM.
OpenAPI 3.x (formerly Swagger) has emerged as the dominant contract format for HTTP APIs and increasingly for event-based and batch data feeds. Its JSON Schema subset gives you field types, required constraints, enum sets, format annotations, and discriminators — everything needed to describe both request and response shapes. Crucially, the ecosystem around OpenAPI is mature: Spectral for linting, oasdiff for breaking change detection, and Pact for consumer-driven contract testing all speak OpenAPI natively. If your organization already uses Avro or Protobuf schema registries for Kafka, OpenAPI fills the complementary role for REST and webhook surfaces.
Schema Enforcement
Validate every request and response against the OpenAPI spec at the middleware layer. Reject malformed payloads before they corrupt downstream consumers.
Breaking Change Detection
Automated CI gates compare the new spec to the baseline and block merges that introduce breaking changes without a major version bump.
Consumer-Driven Testing
Each consumer publishes the exact fields it needs. Producers run consumer pact files in CI before every release — no integration environment required.
Structuring an OpenAPI Document as a Data Contract
A production-grade OpenAPI contract goes beyond just listing endpoints. It encodes ownership metadata via the x- extension namespace, SLA guarantees, deprecation timelines, and explicit backward compatibility promises. The info block becomes a contract header; the components/schemas section is the authoritative type registry.
# openapi.yaml — data contract for the Orders API v2
openapi: "3.1.0"
info:
title: Orders API
version: "2.3.1" # semver: MAJOR.MINOR.PATCH
description: >
Authoritative schema for the Orders domain. Consumers must pin to a
MAJOR version. MINOR and PATCH releases are backward-compatible.
contact:
name: Orders Team
email: orders-team@company.com
x-contract:
owner: orders-team
domain: commerce
sla:
availability: "99.9%"
latency_p99_ms: 200
freshness_minutes: 5
breaking-change-policy: "major version bump required"
deprecation-notice-days: 90
servers:
- url: https://api.company.com/orders/v2
description: Production
- url: https://api-staging.company.com/orders/v2
description: Staging
paths:
/orders:
get:
operationId: listOrders
summary: List orders for a customer
parameters:
- name: customer_id
in: query
required: true
schema:
type: string
format: uuid
- name: status
in: query
schema:
type: string
enum: [pending, confirmed, shipped, delivered, cancelled]
- name: page_size
in: query
schema:
type: integer
minimum: 1
maximum: 100
default: 20
responses:
"200":
description: Paginated list of orders
content:
application/json:
schema:
$ref: "#/components/schemas/OrderListResponse"
"400":
$ref: "#/components/responses/ValidationError"
"401":
$ref: "#/components/responses/Unauthorized"
components:
schemas:
OrderListResponse:
type: object
required: [data, pagination]
properties:
data:
type: array
items:
$ref: "#/components/schemas/Order"
pagination:
$ref: "#/components/schemas/Pagination"
Order:
type: object
required: [order_id, customer_id, status, total_amount, currency, created_at]
properties:
order_id:
type: string
format: uuid
description: Immutable unique identifier for the order.
customer_id:
type: string
format: uuid
status:
type: string
enum: [pending, confirmed, shipped, delivered, cancelled]
x-contract-note: "enum is append-only — new values are non-breaking"
total_amount:
type: number
format: decimal
minimum: 0
description: Order total in minor currency units (e.g. cents).
currency:
type: string
pattern: "^[A-Z]{3}$"
description: ISO 4217 currency code.
created_at:
type: string
format: date-time
description: RFC 3339 timestamp of order creation (UTC).
shipping_address:
$ref: "#/components/schemas/Address"
nullable: false
x-added-in: "2.1.0"
Address:
type: object
required: [street, city, country_code]
properties:
street: { type: string }
city: { type: string }
postcode: { type: string }
country_code:
type: string
pattern: "^[A-Z]{2}$"
Pagination:
type: object
required: [total, page, page_size, has_next]
properties:
total: { type: integer, minimum: 0 }
page: { type: integer, minimum: 1 }
page_size: { type: integer, minimum: 1, maximum: 100 }
has_next: { type: boolean }
responses:
ValidationError:
description: Request failed schema validation
content:
application/json:
schema:
type: object
required: [error, details]
properties:
error: { type: string }
details: { type: array, items: { type: string } }
Unauthorized:
description: Missing or invalid authentication tokenNote
x-added-in. Consumers targeting an older minor version will receive the new optional field and must tolerate unknown properties — a contract clause worth stating explicitly in the info.x-contract block. Strict consumers that reject unknown fields will break on your MINOR releases, which is their bug, not yours — but document it.Schema Enforcement — Validation Middleware and Linting
Declaring a contract means nothing without enforcement. Two layers work together: Spectral linting at commit time catches authoring mistakes before the spec reaches CI, and middleware validation at request time rejects non-conforming payloads from both producers and consumers.
# .spectral.yaml — lint rules for your OpenAPI contracts
extends: ["spectral:oas"]
rules:
# Every operation must have an operationId for code generation stability
operation-operationId:
description: "Operations must have operationId"
severity: error
# All request and response schemas must use $ref, not inline definitions
no-inline-schema:
description: "Inline schemas are forbidden — use $ref to components/schemas"
severity: warn
given: "$.paths[*][*].responses[*].content[*].schema"
then:
function: schema
functionOptions:
schema:
properties:
$ref:
type: string
required: [$ref]
# Contract owner extension is mandatory
contract-owner-required:
description: "info.x-contract.owner must be set"
severity: error
given: "$.info"
then:
field: "x-contract"
function: truthy
# Deprecated endpoints must have a sunset date
deprecated-must-have-sunset:
description: "Deprecated operations must include x-sunset date"
severity: warn
given: "$.paths[*][*][?(@.deprecated == true)]"
then:
field: x-sunset
function: truthy
# Run via: npx @stoplight/spectral-cli lint openapi.yaml --ruleset .spectral.yaml# Python FastAPI — request + response validation against OpenAPI spec
# Uses openapi-core for spec-driven validation independent of your framework
pip install openapi-core fastapi uvicorn pyyaml
# middleware/contract_validator.py
import yaml
from pathlib import Path
from openapi_core import OpenAPI
from openapi_core.contrib.starlette import StarletteOpenAPIRequest, StarletteOpenAPIResponse
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import JSONResponse
import logging
logger = logging.getLogger(__name__)
class ContractValidationMiddleware(BaseHTTPMiddleware):
def __init__(self, app, spec_path: str = "openapi.yaml"):
super().__init__(app)
spec = yaml.safe_load(Path(spec_path).read_text())
self.openapi = OpenAPI.from_dict(spec)
async def dispatch(self, request: Request, call_next):
# Validate incoming request
openapi_request = StarletteOpenAPIRequest(request)
try:
self.openapi.validate_request(openapi_request)
except Exception as exc:
logger.warning("Request contract violation: %s %s — %s",
request.method, request.url.path, exc)
return JSONResponse(
status_code=400,
content={"error": "contract_violation", "details": str(exc)},
)
response = await call_next(request)
# Validate outgoing response in non-production environments
if request.app.state.env != "production":
openapi_response = StarletteOpenAPIResponse(response)
try:
self.openapi.validate_response(openapi_request, openapi_response)
except Exception as exc:
logger.error("Response contract violation: %s %s → %s — %s",
request.method, request.url.path,
response.status_code, exc)
# In staging: block the response and alert
return JSONResponse(
status_code=500,
content={"error": "response_contract_violation", "details": str(exc)},
)
return response
# main.py
from fastapi import FastAPI
from middleware.contract_validator import ContractValidationMiddleware
app = FastAPI()
app.add_middleware(ContractValidationMiddleware, spec_path="openapi.yaml")
@app.get("/orders")
async def list_orders(customer_id: str, status: str = None):
# Your handler — the middleware guarantees the request is contract-valid
...Note
Versioning Strategy — Semantic Versioning for APIs
API versioning with semver means consumers can reason about upgrade risk before they read a changelog. The rule is simple but the classification of changes is where teams struggle. Breaking changes — anything that can cause a correctly-written consumer to start failing — require a MAJOR bump. Non-breaking additions are MINOR. Bug fixes and documentation updates are PATCH.
# Classifying API changes for semver
#
# ── BREAKING (requires MAJOR bump) ──────────────────────────────────────
# - Remove any field from a response schema
# - Remove any endpoint or HTTP method
# - Rename a field (even with a deprecation comment)
# - Change a field type: string → integer, object → array
# - Add a new REQUIRED field to a request body
# - Narrow an enum: remove an existing enum value from a response field
# - Change semantics: reverse pagination direction, change sort default
# - Remove or rename an operationId (breaks code-generated clients)
# - Make an optional request parameter required
#
# ── NON-BREAKING (MINOR bump) ──────────────────────────────────────────
# - Add a new optional field to a response (consumers must tolerate extras)
# - Add a new optional query parameter
# - Add a new endpoint or operation
# - Widen an enum: add a new value to a response field
# - Add a new HTTP method to an existing path
# - Relax a constraint: raise maximum, lower minimum
# - Add a new error response status code
#
# ── PATCH (no consumer impact) ─────────────────────────────────────────
# - Fix a description or example
# - Add or fix a format annotation that doesn't change validation
# - Tighten a pattern that was previously too permissive (if already enforced)
# - Add x- extension metadata
# Version routing in nginx — serve /v1 and /v2 simultaneously during migration
server {
location /orders/v1/ {
proxy_pass http://orders-service-v1:8000/;
add_header X-API-Version "1" always;
add_header Deprecation "true" always;
add_header Sunset "Thu, 31 Dec 2026 23:59:59 GMT" always;
}
location /orders/v2/ {
proxy_pass http://orders-service-v2:8000/;
add_header X-API-Version "2" always;
}
}The same discipline applies to event-driven systems. If your team publishes Kafka events, the schema versioning concepts from Kafka Schema Registry with Avro map directly: BACKWARD_TRANSITIVE compatibility in the Schema Registry enforces the same non-breaking rules as a MINOR semver bump in OpenAPI.
Breaking Change Detection with oasdiff
oasdiff is a Go CLI that compares two OpenAPI specifications and classifies the diff into breaking and non-breaking changes using the same rules as the semver table above. Running it as a CI gate prevents accidental breaking changes from merging without a reviewed major version decision.
# Install oasdiff (Go binary, no runtime dependencies)
go install github.com/tufin/oasdiff@latest
# or via Homebrew
brew install tufin/tufin/oasdiff
# Compare the current spec to the last released version
# Exit code 1 if breaking changes are found
oasdiff breaking openapi-v2.3.0.yaml openapi-v2.4.0-draft.yaml
# Example output:
# GET /orders response 200 body property 'data/items/status' removed enum value 'processing'
# GET /orders response 200 body property 'data/items/order_ref' removed
# Check only for changes that are ERRORs (breaking) vs WARNings (non-breaking)
oasdiff breaking --severity error openapi-v2.3.0.yaml openapi-v2.4.0-draft.yaml
# Output as JSON for programmatic processing
oasdiff breaking --format json openapi-v2.3.0.yaml openapi-v2.4.0-draft.yaml | jq '.[] | select(.level == "ERR")'
# Full changelog (all changes, classified)
oasdiff changelog openapi-v2.3.0.yaml openapi-v2.4.0-draft.yaml
# Diff as flat list — useful for PR description generation
oasdiff diff --format text openapi-v2.3.0.yaml openapi-v2.4.0-draft.yaml# .github/workflows/contract-check.yml — CI gate for OpenAPI contracts
name: Contract Check
on:
pull_request:
paths:
- "openapi/**"
- "openapi.yaml"
jobs:
breaking-change-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install oasdiff
run: |
curl -sSfL https://raw.githubusercontent.com/tufin/oasdiff/main/install.sh | sh
echo "${HOME}/.oasdiff/bin" >> ${GITHUB_PATH}
- name: Fetch baseline spec from main branch
run: |
git show origin/main:openapi.yaml > openapi-baseline.yaml
- name: Lint new spec with Spectral
run: npx @stoplight/spectral-cli lint openapi.yaml --ruleset .spectral.yaml
- name: Check for breaking changes
id: breaking
run: |
set +e
oasdiff breaking openapi-baseline.yaml openapi.yaml --format json > breaking.json
EXIT_CODE=$?
echo "exit_code=${EXIT_CODE}" >> ${GITHUB_OUTPUT}
set -e
- name: Post breaking changes as PR comment
if: steps.breaking.outputs.exit_code != '0'
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
const breaking = JSON.parse(fs.readFileSync('breaking.json', 'utf8'));
const body = [
'## ⛔ Breaking API Contract Changes Detected',
'',
'This PR introduces the following breaking changes:',
'',
...breaking.map(c => '- **' + c.id + '**: ' + c.text),
'',
'If this is intentional, bump the MAJOR version in `info.version` and add a migration guide.',
].join('\n');
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body,
});
- name: Fail if breaking changes without major version bump
if: steps.breaking.outputs.exit_code != '0'
run: |
BASELINE_MAJOR=$(yq '.info.version' openapi-baseline.yaml | cut -d. -f1)
NEW_MAJOR=$(yq '.info.version' openapi.yaml | cut -d. -f1)
if [ "${BASELINE_MAJOR}" = "${NEW_MAJOR}" ]; then
echo "Breaking changes found without a MAJOR version bump. Failing."
exit 1
fi
echo "MAJOR version bumped from ${BASELINE_MAJOR} to ${NEW_MAJOR} — breaking changes are allowed."
- name: Generate changelog artifact
run: oasdiff changelog openapi-baseline.yaml openapi.yaml > CHANGELOG.md
- uses: actions/upload-artifact@v4
with:
name: api-changelog
path: CHANGELOG.mdConsumer-Driven Contract Testing with Pact
oasdiff tells you what changed in the spec. Pact tells you which consumer actually uses each field. Consumer-driven testing inverts the normal flow: each consumer writes a test that records exactly what it sends and expects, producing a pact file. Providers verify all published pact files before deploying. When the Orders API drops the currency field, the Billing Service pact immediately fails on the Orders provider side — before a single byte of production traffic is affected.
# pip install pact-python
# consumer_test.py — Billing Service defines what it needs from Orders API
import pytest
from pact import Consumer, Provider
from billing.orders_client import OrdersClient
@pytest.fixture(scope="session")
def pact():
pact = Consumer("billing-service").has_pact_with(
Provider("orders-api"),
pact_dir="./pacts",
publish_verification_results=True,
broker_url="https://pact-broker.company.com",
broker_token=os.environ["PACT_BROKER_TOKEN"],
)
pact.start_service()
yield pact
pact.stop_service()
def test_get_orders_for_billing(pact):
# Consumer declares: I send this request and expect this response
(pact
.given("customer abc123 has 2 orders")
.upon_receiving("a request to list orders for billing")
.with_request(
method="GET",
path="/orders",
query={"customer_id": "abc123-0000-0000-0000-000000000000"},
headers={"Authorization": "Bearer token123"},
)
.will_respond_with(
status=200,
body={
"data": pact.each_like({
"order_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"customer_id": "abc123-0000-0000-0000-000000000000",
"status": "confirmed",
# billing only cares about total_amount and currency
"total_amount": 4995,
"currency": "USD",
"created_at": "2026-06-19T10:00:00Z",
}),
"pagination": {
"total": 2,
"page": 1,
"page_size": 20,
"has_next": False,
},
},
))
with pact:
client = OrdersClient(base_url=pact.uri)
orders = client.list_orders(customer_id="abc123-0000-0000-0000-000000000000")
assert len(orders) > 0
assert orders[0]["currency"] == "USD"
# ── Provider verification (run on the Orders service CI) ─────────────────
# provider_test.py
from pact import Verifier
def test_provider_honors_consumer_pacts():
verifier = Verifier(
provider="orders-api",
provider_base_url="http://localhost:8000",
)
output, _ = verifier.verify_with_broker(
broker_url="https://pact-broker.company.com",
broker_token=os.environ["PACT_BROKER_TOKEN"],
publish_verification_results=True,
provider_version=os.environ["GIT_SHA"],
enable_pending=True, # don't fail on pacts not yet in production
)
assert output == 0, "Provider failed consumer pact verification"Note
Contract Registry, Governance, and Discovery
At scale, contracts need a home beyond a Git repo. A contract registry — whether a dedicated tool like Backstage or a curated S3 + API — makes contracts discoverable, linkable from dashboards, and referenceable from CI. The same principle applies when your team also manages batch data contracts: the schema validation patterns from Great Expectations and dbt complement OpenAPI contracts by covering the data-at-rest surface, while OpenAPI governs the data-in-motion surface.
# Backstage catalog-info.yaml — register your API contract as a Backstage entity
apiVersion: backstage.io/v1alpha1
kind: API
metadata:
name: orders-api
title: Orders API
description: Authoritative schema for the Orders domain
tags:
- rest
- commerce
- data-contract
annotations:
backstage.io/techdocs-ref: dir:.
github.com/project-slug: company/orders-service
pagerduty.com/service-id: PXYZ123
links:
- url: https://pact-broker.company.com/pacts/provider/orders-api
title: Consumer Pacts
- url: https://api.company.com/orders/v2/docs
title: Live API Docs
spec:
type: openapi
lifecycle: production
owner: group:orders-team
definition:
$text: ./openapi.yaml
---
# Automated spec publishing script (run in CI after merge to main)
#!/usr/bin/env bash
set -euo pipefail
VERSION=$(yq '.info.version' openapi.yaml)
SPEC_KEY="contracts/orders-api/${VERSION}/openapi.yaml"
# Archive immutable version snapshot to S3
aws s3 cp openapi.yaml "s3://company-contracts/${SPEC_KEY}" --content-type "application/yaml" --metadata "team=orders-team,published-at=$(date -u +%Y-%m-%dT%H:%M:%SZ)"
# Update the "latest" pointer for the current major version
MAJOR=$(echo "${VERSION}" | cut -d. -f1)
aws s3 cp openapi.yaml "s3://company-contracts/contracts/orders-api/v${MAJOR}/latest/openapi.yaml" --content-type "application/yaml"
echo "Published orders-api@${VERSION} to contract registry"
# Notify Slack #api-contracts channel
curl -s -X POST "${SLACK_WEBHOOK_URL}" -H "Content-Type: application/json" -d "{"text": "orders-api v${VERSION} published to contract registry — $(oasdiff changelog ${PREV_SPEC} openapi.yaml | wc -l) changes"}"
Deprecation Workflow — Sunset Headers and Migration Windows
Deprecating a MAJOR version is a coordination problem. Consumers need enough notice to migrate, and producers need to know when they can safely decommission the old version. The HTTP Deprecation and Sunset response headers (RFC 8594) are the standard signaling mechanism.
# FastAPI — inject deprecation headers on v1 routes
from fastapi import FastAPI, Response
from datetime import datetime, timezone
app_v1 = FastAPI(title="Orders API v1 (deprecated)")
SUNSET_DATE = "Thu, 31 Dec 2026 23:59:59 GMT"
@app_v1.middleware("http")
async def add_deprecation_headers(request, call_next):
response = await call_next(request)
response.headers["Deprecation"] = "true"
response.headers["Sunset"] = SUNSET_DATE
response.headers["Link"] = (
'<https://api.company.com/orders/v2>; rel="successor-version"'
)
return response
# Prometheus metric to track v1 usage — alert when consumers haven't migrated
from prometheus_client import Counter
v1_requests = Counter(
"orders_api_v1_requests_total",
"HTTP requests to the deprecated v1 endpoint",
["consumer", "endpoint"],
)
@app_v1.middleware("http")
async def track_v1_usage(request, call_next):
consumer = request.headers.get("X-Consumer-ID", "unknown")
v1_requests.labels(
consumer=consumer,
endpoint=request.url.path,
).inc()
return await call_next(request)
# Alert rule: fire if any consumer is still calling v1 within 30 days of sunset
# - alert: DeprecatedAPIStillInUse
# expr: increase(orders_api_v1_requests_total[24h]) > 0
# labels:
# severity: warning
# annotations:
# summary: "Consumer {{ $labels.consumer }} still calling deprecated orders-api v1"
# sunset: "2026-12-31"Production Checklist
Store the OpenAPI contract in the same repo as the producer code — drift between contract and implementation is the primary failure mode.
Run Spectral linting in pre-commit hooks so authoring mistakes surface before CI. Gate PR merges on a clean Spectral run.
Pin oasdiff to an exact version in CI. Breaking-change classification rules change across minor oasdiff releases.
Keep N-1 MAJOR versions alive for a minimum of 90 days post-deprecation — Sunset headers must give consumers a realistic migration window.
Add the Deprecation and Sunset response headers to all v(N-1) responses as soon as v(N) ships — not on the sunset date itself.
Instrument v(N-1) traffic with a Prometheus counter labeled by consumer ID. Alert ops when consumers haven't migrated within 60 days of the sunset date.
Validate responses in staging on every deploy. A contract violation in staging is a bug; the same violation in production is an incident.
Publish consumer pact files to a centralized Pact Broker, not to the provider repo. Consumers own their expectations; providers verify them.
Use enable_pending=True in Pact provider verification during the first 30 days of a new consumer pact. Disable it after the provider has verified once.
Generate SDK clients from the OpenAPI spec (openapi-generator) and publish them as versioned packages. Type-safe clients catch breaking changes at compile time.
Producers breaking downstream pipelines with unannounced field changes, no CI gate to catch breaking API changes before merge, or consumers tightly coupled to internal implementation details?
We design and implement data contract governance systems — from OpenAPI spec authoring with Spectral linting rules and x-contract ownership annotations to oasdiff breaking change detection gates in GitHub Actions CI, request and response middleware validation with structured error reporting, Pact consumer-driven contract testing setup with a centralized Pact Broker and enable_pending onboarding workflows, RFC 8594 Deprecation and Sunset header injection on deprecated routes, Prometheus metrics for migration progress tracking, Backstage API catalog entity registration, and SDK client generation from the OpenAPI spec. Let’s talk.
Let's Talk