What industries do you work with?

We work across a wide range of industries including finance, healthcare, e-commerce, logistics, and telecommunications. Our solutions are tailored to each client’s specific domain requirements and regulatory environment.

How long does a typical engagement take?

It depends on the scope. A focused observability deployment or automation workflow can be delivered in 4-6 weeks. Larger initiatives like full-scale LLM integration or platform builds typically run 2-4 months. We always start with a discovery phase to align on timelines.

Do you offer ongoing support after project delivery?

Yes. We offer flexible support and maintenance plans to ensure your systems stay healthy, updated, and optimized. We can also embed with your team on a part-time basis for continuous improvement.

Can you work with our existing tech stack?

Absolutely. We integrate with your current infrastructure and tools rather than forcing a rip-and-replace. Whether you’re on AWS, GCP, Azure, or on-prem, we adapt our approach to what works best for your environment.

What is your pricing model?

We offer both fixed-price project engagements and time-and-materials contracts depending on the nature of the work. Reach out through our contact form and we’ll provide a tailored estimate within 24 hours.

How do you handle data security and compliance?

Security is built into every engagement. We follow industry best practices for data handling, support GDPR and SOC 2 compliance requirements, and can work within your existing security policies and access controls.

API Gateway Patterns — Rate Limiting, Auth, and Traffic Shaping at the Edge

Why the API Gateway Became a Critical Control Plane

As systems decompose into microservices, the entry point where external traffic meets internal services becomes one of the highest-leverage locations in your architecture. A well-designed API gateway is not just a reverse proxy — it is the place where you enforce authentication, apply rate limits, shape traffic, observe every request, and route to the right service version without touching application code.

The patterns that make gateways production-grade are often underspecified. Most documentation covers basic routing. This article covers what actually matters in production: rate limiting algorithms, edge authentication with JWTs and API keys, circuit breaking, canary deployments, request/response transformation, and the operational model for tools like Kong Gateway, AWS API Gateway, and Envoy Proxy.

Tool	Model	Config	Best for
Kong Gateway	OSS / Enterprise	Admin API / declarative YAML	Self-hosted, plugin ecosystem
AWS API Gateway	Managed SaaS	CloudFormation / CDK	Serverless Lambda backends
Envoy / Istio	OSS sidecar / mesh	xDS API / CRDs	Kubernetes, service mesh, mTLS
Traefik	OSS / Enterprise	Labels / CRDs / static config	Docker & Kubernetes auto-discovery

Rate Limiting — Algorithms That Actually Matter in Production

Rate limiting at the gateway prevents upstream services from being overwhelmed, enforces per-customer quotas, and is the first line of defence against credential stuffing and scraping. The algorithm you choose has significant impact on how "fair" the limiting feels to legitimate clients and how bursty traffic is handled.

Token Bucket

The token bucket algorithm maintains a bucket of tokens that refills at a constant rate (e.g., 100 tokens/second). Each request consumes one token. If the bucket is empty, the request is rejected with HTTP 429. The bucket has a maximum capacity — burst headroom above the steady-state rate. This is the most widely implemented algorithm because it naturally allows short bursts while enforcing a long-term average.

# Kong Gateway — rate limiting plugin via declarative config (deck)
# File: kong.yaml

services:
  - name: user-api
    url: http://user-service:8080
    routes:
      - name: user-api-route
        paths:
          - /api/v1/users
        plugins:
          - name: rate-limiting
            config:
              minute: 60          # 60 requests per minute steady-state
              hour: 1000          # 1000 requests per hour cap
              policy: redis       # use Redis for distributed rate limiting
              redis_host: redis
              redis_port: 6379
              redis_password: ${REDIS_PASSWORD}
              limit_by: consumer  # per authenticated consumer, not per IP
              hide_client_headers: false
              # Response headers added:
              #   X-RateLimit-Limit-Minute: 60
              #   X-RateLimit-Remaining-Minute: 45
              #   X-RateLimit-Reset-Minute: 1714089600

Sliding Window Counter

A fixed window counter (e.g., "100 requests per minute, resetting at :00") has a well-known edge case: a client can send 100 requests at :59 and 100 more at :01, effectively 200 requests in two seconds without triggering the limit. Sliding window counters fix this by computing the rate over a rolling window. The implementation uses two adjacent window counts weighted by the fraction of the current window that has elapsed.

import time
import redis

class SlidingWindowRateLimiter:
    """
    Sliding window rate limiter backed by Redis.
    Uses two fixed-window counters and weights them by elapsed fraction.
    """

    def __init__(self, redis_client: redis.Redis, limit: int, window_seconds: int):
        self.redis = redis_client
        self.limit = limit
        self.window = window_seconds

    def is_allowed(self, key: str) -> tuple[bool, dict]:
        now = time.time()
        current_window = int(now // self.window)
        previous_window = current_window - 1
        elapsed_fraction = (now % self.window) / self.window

        current_key = f"rl:{key}:{current_window}"
        previous_key = f"rl:{key}:{previous_window}"

        pipe = self.redis.pipeline()
        pipe.get(current_key)
        pipe.get(previous_key)
        current_count_raw, previous_count_raw = pipe.execute()

        current_count = int(current_count_raw or 0)
        previous_count = int(previous_count_raw or 0)

        # Weighted estimate: previous window contributes (1 - elapsed) fraction
        estimated_count = previous_count * (1 - elapsed_fraction) + current_count

        if estimated_count >= self.limit:
            reset_at = (current_window + 1) * self.window
            return False, {
                "limit": self.limit,
                "remaining": 0,
                "reset": int(reset_at),
            }

        # Increment current window counter
        pipe = self.redis.pipeline()
        pipe.incr(current_key)
        pipe.expire(current_key, self.window * 2)  # keep for 2 windows
        pipe.execute()

        remaining = max(0, self.limit - int(estimated_count) - 1)
        return True, {
            "limit": self.limit,
            "remaining": remaining,
            "reset": int((current_window + 1) * self.window),
        }


# FastAPI middleware usage
from fastapi import FastAPI, Request, HTTPException
from fastapi.responses import JSONResponse

app = FastAPI()
limiter = SlidingWindowRateLimiter(
    redis_client=redis.Redis(host="redis", port=6379),
    limit=100,
    window_seconds=60,
)

@app.middleware("http")
async def rate_limit_middleware(request: Request, call_next):
    # Use authenticated user ID if available, fall back to IP
    client_key = request.headers.get("X-Consumer-ID") or request.client.host
    allowed, headers = limiter.is_allowed(client_key)

    if not allowed:
        return JSONResponse(
            status_code=429,
            content={"error": "rate_limit_exceeded", "retry_after": headers["reset"]},
            headers={
                "X-RateLimit-Limit": str(headers["limit"]),
                "X-RateLimit-Remaining": "0",
                "X-RateLimit-Reset": str(headers["reset"]),
                "Retry-After": str(headers["reset"] - int(time.time())),
            },
        )

    response = await call_next(request)
    response.headers["X-RateLimit-Limit"] = str(headers["limit"])
    response.headers["X-RateLimit-Remaining"] = str(headers["remaining"])
    response.headers["X-RateLimit-Reset"] = str(headers["reset"])
    return response

Note

Always implement rate limiting in a distributed store (Redis) rather than in-process memory. In-process state is not shared across gateway replicas, so a client could multiply its effective limit by the number of gateway instances. Use Redis Cluster or a Redis Sentinel setup for HA, and always set key TTLs to prevent unbounded memory growth.

Authentication at the Edge — JWT Validation and API Keys

Authenticating at the gateway — before requests ever reach upstream services — eliminates auth duplication across every microservice and creates a single enforcement point. Two patterns dominate: JWT validation for user-facing APIs with short-lived tokens, and API key authentication for machine-to-machine integrations.

JWT Validation at the Gateway

The gateway validates the JWT signature against a public key (fetched from a JWKS endpoint), checks expiry and required claims, and forwards the decoded claims as request headers to upstream services. Upstream services trust the gateway — they do not re-validate the token. This pattern keeps cryptographic operations centralized and lets services focus on business logic.

# Kong JWT plugin — declarative configuration
# Validates RS256 tokens issued by your identity provider

plugins:
  - name: jwt
    config:
      uri_param_names: []          # do not accept JWT in query params
      cookie_names: []             # do not accept JWT in cookies
      header_names:
        - Authorization            # Bearer token in Authorization header
      claims_to_verify:
        - exp                      # verify expiry
        - nbf                      # verify not-before (if present)
      key_claim_name: iss          # use iss claim to look up the public key
      secret_is_base64: false
      anonymous: null              # null = reject unauthenticated requests
      run_on_preflight: true

---

# Consumer with associated RSA public key (JWKS-based setup)
# In production, use the OIDC plugin or an external JWKS URL instead
consumers:
  - username: my-service
    jwt_secrets:
      - key: "https://auth.example.com"    # matches iss claim in JWT
        algorithm: RS256
        rsa_public_key: |
          -----BEGIN PUBLIC KEY-----
          MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
          -----END PUBLIC KEY-----

# Nginx / OpenResty — inline JWT validation with lua-resty-jwt
# Alternative to a full gateway when you need custom claim routing

location /api/ {
    access_by_lua_block {
        local jwt = require "resty.jwt"
        local cjson = require "cjson"

        local auth_header = ngx.req.get_headers()["Authorization"]
        if not auth_header or not auth_header:match("^Bearer ") then
            ngx.status = 401
            ngx.header["WWW-Authenticate"] = 'Bearer realm="api"'
            ngx.say(cjson.encode({error = "missing_token"}))
            return ngx.exit(401)
        end

        local token = auth_header:sub(8)  -- strip "Bearer "

        local verified = jwt:verify(
            os.getenv("JWT_PUBLIC_KEY"),
            token,
            {
                lifetime_grace_period = 30,  -- 30s clock skew tolerance
                valid_issuers = {"https://auth.example.com"},
                valid_audiences = {"api.example.com"},
            }
        )

        if not verified.verified then
            ngx.status = 401
            ngx.say(cjson.encode({error = "invalid_token", detail = verified.reason}))
            return ngx.exit(401)
        end

        -- Forward decoded claims to upstream
        ngx.req.set_header("X-Consumer-ID", verified.payload.sub)
        ngx.req.set_header("X-Consumer-Roles", table.concat(verified.payload.roles or {}, ","))
        ngx.req.set_header("X-Tenant-ID", verified.payload.tenant_id)

        -- Remove raw Authorization header before forwarding (optional)
        -- ngx.req.clear_header("Authorization")
    }

    proxy_pass http://upstream_service;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
}

API Key Authentication

API keys are long-lived opaque tokens suitable for server-to-server integrations where short-lived JWTs would add unnecessary complexity. The key is stored hashed in the gateway's database (never in plaintext), and the gateway performs a lookup on every request. For high-throughput APIs, cache the lookup result in Redis with a short TTL (30–60 seconds) to avoid database hotspots.

# Generating and storing API keys securely — Python example

import secrets
import hashlib
from datetime import datetime, UTC

def generate_api_key() -> tuple[str, str]:
    """
    Returns (raw_key, hashed_key).
    raw_key is shown to the user once, hashed_key is stored.
    """
    raw = secrets.token_urlsafe(32)          # 256 bits of entropy
    hashed = hashlib.sha256(raw.encode()).hexdigest()
    return raw, hashed


def create_api_key_for_service(service_name: str, scopes: list[str], db) -> str:
    raw_key, hashed_key = generate_api_key()

    db.execute(
        """
        INSERT INTO api_keys (key_hash, service_name, scopes, created_at, last_used_at)
        VALUES ($1, $2, $3, $4, NULL)
        """,
        hashed_key, service_name, scopes, datetime.now(UTC),
    )

    # Return raw key — this is the ONLY time the client sees it
    return raw_key


# Gateway middleware — validate incoming API key
async def validate_api_key(request: Request, db, cache: redis.Redis) -> dict:
    api_key = (
        request.headers.get("X-API-Key")
        or request.query_params.get("api_key")
    )

    if not api_key:
        raise HTTPException(status_code=401, detail="missing_api_key")

    key_hash = hashlib.sha256(api_key.encode()).hexdigest()

    # Check Redis cache first
    cached = await cache.get(f"apikey:{key_hash}")
    if cached:
        return json.loads(cached)

    # Fall back to database
    row = await db.fetchrow(
        "SELECT service_name, scopes, revoked_at FROM api_keys WHERE key_hash = $1",
        key_hash,
    )

    if not row or row["revoked_at"] is not None:
        raise HTTPException(status_code=401, detail="invalid_api_key")

    service_data = {"service": row["service_name"], "scopes": row["scopes"]}

    # Cache for 60 seconds — short enough to respect revocations quickly
    await cache.setex(f"apikey:{key_hash}", 60, json.dumps(service_data))
    await db.execute(
        "UPDATE api_keys SET last_used_at = NOW() WHERE key_hash = $1", key_hash
    )

    return service_data

Note

Never log raw API keys or JWTs — they are credentials. Log only the hashed key or a truncated prefix (first 8 characters) for correlation. Rotate API keys with a dual-key grace period: issue a new key, allow both old and new keys to work for 7 days, then revoke the old key. This prevents hard cutover failures.

Circuit Breaking — Failing Fast Before Cascading Failures

When an upstream service becomes slow or unhealthy, without circuit breaking the gateway continues forwarding requests, threads accumulate waiting for responses, connection pools exhaust, and the failure cascades upstream. Circuit breaking interrupts this by tracking failure rates and temporarily rejecting requests to unhealthy upstreams — allowing them time to recover.

The classic circuit breaker state machine has three states: Closed (normal operation, failures tracked), Open (requests rejected immediately, upstream gets recovery time), and Half-Open (a probe request is allowed through — if it succeeds, the circuit closes; if it fails, it re-opens).

# Envoy circuit breaker configuration — cluster-level
# Applied to all requests routed to the upstream cluster

clusters:
  - name: user_service
    type: STRICT_DNS
    load_assignment:
      cluster_name: user_service
      endpoints:
        - lb_endpoints:
            - endpoint:
                address:
                  socket_address:
                    address: user-service
                    port_value: 8080

    circuit_breakers:
      thresholds:
        - priority: DEFAULT
          max_connections: 100       # max concurrent TCP connections
          max_pending_requests: 50   # max requests queued while waiting for connection
          max_requests: 200          # max concurrent active requests
          max_retries: 3             # max concurrent retries in flight
          track_remaining: true      # expose stats on remaining capacity

    # Outlier detection — ejects unhealthy hosts from the load balancer
    outlier_detection:
      consecutive_5xx: 5             # eject after 5 consecutive 5xx responses
      interval: 10s                  # evaluation interval
      base_ejection_time: 30s        # minimum ejection duration
      max_ejection_percent: 50       # never eject more than 50% of hosts
      success_rate_minimum_hosts: 3  # need at least 3 hosts to compute success rate
      success_rate_request_volume: 100
      success_rate_stdev_factor: 1900

    # Upstream connection timeout settings
    connect_timeout: 2s
    upstream_connection_options:
      tcp_keepalive:
        keepalive_probes: 3
        keepalive_time: 30
        keepalive_interval: 5

# Kong circuit breaker via the upstream health checks config
# Passive health checks detect failures from actual traffic

upstreams:
  - name: user-service-upstream
    algorithm: round-robin
    healthchecks:
      passive:
        healthy:
          http_statuses: [200, 201, 202, 204]
          successes: 3               # 3 consecutive successes re-marks target healthy
        unhealthy:
          http_statuses: [429, 500, 502, 503, 504]
          http_failures: 5           # 5 failures marks target unhealthy
          tcp_failures: 2
          timeouts: 3
      active:
        healthy:
          interval: 10               # probe every 10s when healthy
          http_statuses: [200, 204]
          successes: 2
        unhealthy:
          interval: 5                # probe every 5s when unhealthy
          http_statuses: [429, 500, 503]
          http_failures: 3
        http_path: /health
        timeout: 2
        concurrency: 5

    targets:
      - target: user-service-1:8080
        weight: 100
      - target: user-service-2:8080
        weight: 100

Canary Deployments and Traffic Shaping at the Gateway

The gateway is the ideal place to implement progressive delivery — routing a small percentage of traffic to a new service version before a full rollout. This decouples deployment (shipping the new binary) from release (sending traffic to it), and lets you validate the new version with real traffic while limiting blast radius.

There are two approaches: weight-based routing (route X% of all requests to v2 regardless of requester) and header-based routing (route requests with a specific header to v2, used for internal testing or beta user groups). Both can be combined.

# Kubernetes Gateway API — HTTPRoute for canary deployment
# Routes 10% of traffic to the new service version

apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: user-api-canary
  namespace: production
spec:
  parentRefs:
    - name: main-gateway
      sectionName: https

  hostnames:
    - api.example.com

  rules:
    # Header-based routing — internal/beta users always hit v2
    - matches:
        - headers:
            - name: X-Feature-Flag
              value: canary-v2
      backendRefs:
        - name: user-service-v2
          port: 8080
          weight: 100

    # Weight-based split — 10% canary, 90% stable
    - matches:
        - path:
            type: PathPrefix
            value: /api/v1/users
      backendRefs:
        - name: user-service-v1
          port: 8080
          weight: 90
        - name: user-service-v2
          port: 8080
          weight: 10

# Istio VirtualService — fine-grained traffic shaping
# Combines header matching, weight-based splitting, and fault injection for testing

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: user-service
  namespace: production
spec:
  hosts:
    - user-service
  http:
    # Internal testers: 100% to v2
    - match:
        - headers:
            x-canary-user:
              exact: "true"
      route:
        - destination:
            host: user-service
            subset: v2
          weight: 100

    # Fault injection for chaos testing — 5% of requests get 500ms delay
    - match:
        - headers:
            x-chaos-test:
              exact: "true"
      fault:
        delay:
          percentage:
            value: 5.0
          fixedDelay: 500ms
      route:
        - destination:
            host: user-service
            subset: v1

    # Default: 95% stable, 5% canary
    - route:
        - destination:
            host: user-service
            subset: v1
          weight: 95
        - destination:
            host: user-service
            subset: v2
          weight: 5
      timeout: 5s
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: gateway-error,connect-failure,retriable-4xx

---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: user-service
  namespace: production
spec:
  host: user-service
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

Note

Canary traffic percentages should progress in steps: 1% → 5% → 10% → 25% → 50% → 100%, with automated rollback if error rates or latency p99 exceed thresholds at each step. Tools like Flagger automate this progression loop, querying Prometheus metrics to make the promotion/rollback decision without human intervention.

Request and Response Transformation

Gateways can modify requests before they reach upstream services and responses before they reach clients. Common use cases: adding correlation IDs, stripping internal headers, normalizing paths across API versions, injecting upstream auth headers, and rewriting response bodies to mask internal service details.

# Kong request-transformer plugin — add/remove/replace headers and body fields

plugins:
  - name: request-transformer
    config:
      add:
        headers:
          - "X-Request-ID:${uuid()}"        # inject correlation ID
          - "X-Forwarded-Tenant:${consumer.custom_id}"
          - "X-Internal-Auth:Bearer ${env.INTERNAL_SERVICE_TOKEN}"
        querystring: []
      remove:
        headers:
          - Authorization                    # strip external auth before forwarding
          - X-Forwarded-For                  # handled by gateway
      replace:
        headers:
          - "Host:internal-user-service.svc.cluster.local"

  - name: response-transformer
    config:
      remove:
        headers:
          - X-Powered-By                     # don't expose internal stack
          - Server                           # don't expose server version
          - X-Internal-Request-ID            # strip internal headers from response
      add:
        headers:
          - "X-Request-ID:${request_id}"    # echo back for client correlation
          - "Cache-Control:no-store"         # force no-cache on authenticated responses

Observability — What to Instrument at the Gateway Layer

The gateway sees every request, making it the ideal place to emit traces, metrics, and structured logs. The key metrics to track at the gateway level, per route and per consumer:

Request rate and error rate

Track requests_total labeled by route, method, and status code family (2xx, 4xx, 5xx). Error rate = 5xx / total. Alert when error rate exceeds 1% for 5 minutes. Track 429 rate separately — a spike indicates either a legitimate traffic surge or a client misbehaving.

Latency percentiles

Track p50, p95, p99 latency per route. The gap between p95 and p99 reveals tail latency. Alert on p99 latency exceeding your SLO. Track upstream latency separately from gateway overhead — if gateway overhead exceeds 5ms p99, investigate middleware or plugin ordering.

Rate limit hit rate

Track the ratio of rate-limited (429) to total requests per consumer. A consumer consistently hitting rate limits may need quota increases or may indicate a client bug (e.g., retry storms). Track this metric to proactively reach out before it becomes a support incident.

Upstream connection pool saturation

Track the number of pending requests waiting for a connection to the upstream pool. If this metric is consistently above zero, the connection pool is undersized for your traffic or the upstream is too slow. This is the leading indicator before circuit breakers trigger.

# Kong Prometheus plugin — export gateway metrics to Prometheus

plugins:
  - name: prometheus
    config:
      status_code_metrics: true    # per-route status code counters
      latency_metrics: true        # gateway + upstream latency histograms
      bandwidth_metrics: true      # bytes in/out per route
      upstream_health_metrics: true

# Resulting metric examples (scrape at /metrics on Kong's metrics port 8001):
#
# kong_http_requests_total{service="user-api",route="user-api-route",
#   method="GET",code="200"} 15432
#
# kong_latency_bucket{service="user-api",type="kong",le="5"} 14000
# kong_latency_bucket{service="user-api",type="upstream",le="50"} 13800
#
# kong_upstream_target_health{upstream="user-service-upstream",
#   target="user-service-1:8080",address="10.0.1.1:8080",
#   state="healthchecks_off|healthy|unhealthy|dns_error"} 1

Production Hardening Checklist

Run the gateway in HA mode with distributed state

A single gateway instance is a single point of failure. Run at least two replicas behind a load balancer. For rate limiting and session state, use Redis (not in-process memory) so all replicas share the same counters. Test failover by killing a gateway pod and verifying traffic continues without disruption.

Set explicit timeouts on every upstream

Without explicit timeouts, a slow upstream holds connections indefinitely, exhausting the gateway's connection pool. Set three timeout values per upstream: connection timeout (how long to wait for a TCP connection — typically 1–2s), read timeout (how long to wait for the first byte of the response — typically 5–30s depending on the endpoint), and write timeout (how long to wait for the client to send the request body).

Implement request size limits

Without body size limits, a malicious client can send a multi-gigabyte payload that buffers in gateway memory. Set a global maximum request body size (e.g., 10MB) with a lower limit on specific endpoints that handle only small JSON payloads. Kong's request-size-limiting plugin handles this; Nginx's client_max_body_size directive works similarly.

Validate and version your gateway configuration

Gateway configuration should live in version control and go through CI. For Kong, use deck validate to lint declarative configs before applying. For AWS API Gateway, use CDK or CloudFormation with staged deployment and rollback. Never apply gateway config changes directly in production consoles — config drift is a production reliability risk.

Separate admin and data plane traffic

Kong's admin API (port 8001 by default) must never be exposed to the public internet — it allows full configuration changes without authentication unless explicitly secured. Bind admin API listeners to a private network interface or loop it through an internal-only load balancer. Apply mutual TLS and RBAC if multiple teams need admin API access.

Plan your plugin execution order carefully

Plugins execute in a defined order. In Kong, authentication plugins run before rate limiting — which means unauthenticated requests are rejected before consuming a rate limit slot (correct behavior). Verify that your plugin ordering matches your intended policy: auth first, then rate limiting by consumer, then request transformation, then routing. Incorrect ordering can create security gaps (rate limiting before auth allows unauthenticated requests to exhaust quotas).

Designing or hardening an API gateway for your microservices platform?

We design and implement production-grade API gateway architectures — from rate limiting strategies and edge authentication to circuit breaking, canary routing, and full observability. Let’s talk.

Get in Touch

API Gateway Patterns — Rate Limiting, Auth, and Traffic Shaping at the Edge

Why the API Gateway Became a Critical Control Plane

Rate Limiting — Algorithms That Actually Matter in Production

Token Bucket

Sliding Window Counter

Authentication at the Edge — JWT Validation and API Keys

JWT Validation at the Gateway

API Key Authentication

Circuit Breaking — Failing Fast Before Cascading Failures

Canary Deployments and Traffic Shaping at the Gateway

Request and Response Transformation

Observability — What to Instrument at the Gateway Layer

Request rate and error rate

Latency percentiles

Rate limit hit rate

Upstream connection pool saturation

Production Hardening Checklist

Run the gateway in HA mode with distributed state

Set explicit timeouts on every upstream

Implement request size limits

Validate and version your gateway configuration

Separate admin and data plane traffic

Plan your plugin execution order carefully

Designing or hardening an API gateway for your microservices platform?

Related Articles