Back to Blog
TerraformAWSIaCDevOpsModulesRemote StateMulti-AccountTerragrunt

Terraform Advanced Patterns — Modules, Remote State, and Multi-Account AWS Infrastructure

Production-grade Terraform patterns for platform and DevOps teams: reusable modules with variable validation blocks and version pinning, remote state on S3 with DynamoDB locking and per-environment state isolation, workspaces vs directory-based environment separation, Terragrunt for DRY configurations across accounts, AWS multi-account infrastructure with IAM role assumption and account ID validation, drift detection pipelines with terraform plan -detailed-exitcode and import blocks, and CI/CD with Atlantis for PR-driven plan and apply workflows.

2026-05-25

Why Advanced Terraform Patterns Matter

Basic Terraform — a handful of .tf files, local state, and a single AWS account — gets teams off the ground quickly. But as infrastructure grows beyond a few services, the shortcuts compound into serious operational risk. Copy-paste module blocks propagate inconsistent naming, tagging, and security defaults across every environment. Local terraform.tfstate files become unsharable, unversioned, and catastrophic to lose. A single AWS account collapses prod, staging, and developer sandboxes into the same blast radius, where an accidental terraform destroy can wipe production resources.

Production-grade Terraform addresses each failure mode deliberately. Reusable modules with validated inputs enforce a contract between the module author and consumers. Remote state on S3 with DynamoDB locking gives every team member access to current state without conflicts. Directory-based environment separation combined with Terragrunt eliminates copy-paste across dev, staging, and prod without sacrificing per-environment configuration overrides. Multi-account AWS infrastructure with IAM role assumption contains the blast radius to a single account. Drift detection pipelines surface out-of-band changes before they become incidents. Atlantis closes the loop by turning pull requests into the canonical approval mechanism for infrastructure changes.

Note

This article targets Terraform 1.5+ and AWS. Module registry patterns apply equally to the public Terraform Registry and private registries in Terraform Cloud or Artifactory. Terragrunt examples use Terragrunt v0.55+. All provider examples use the hashicorp/aws provider 5.x.

Writing Reusable Modules with Variable Validation

A Terraform module is simply a directory of .tf files with a well-defined interface: inputs declared in variables.tf, resources in main.tf, outputs in outputs.tf, and provider version requirements in versions.tf. The canonical module structure keeps these concerns separated so that consumers can read the interface without parsing resource logic, and so that automated tools like terraform-docs can generate accurate documentation from the standard files.

Variable validation blocks, introduced in Terraform 0.13, let module authors encode invariants directly in the module interface rather than relying on downstream plan failures or runtime AWS errors. A validation block takes a condition expression and an error_message: when the condition is false, Terraform reports the message before making any API calls. Terraform 1.5 extended this further with check blocks for preconditions and postconditions on individual resources — allowing module authors to assert that an AWS resource reached the expected state after apply, not just that the inputs were valid.

ECS Service Module — Structure and Validated Variables

# modules/ecs-service/versions.tf
# Lock provider versions to prevent unexpected upgrades across teams.
# Use ~> for minor version tolerance; pin major versions exactly.

terraform {
  required_version = ">= 1.5.0, < 2.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}


# modules/ecs-service/variables.tf
# Variable validation encodes the module contract.
# Consumers get actionable error messages instead of cryptic AWS API errors.

variable "service_name" {
  type        = string
  description = "Name of the ECS service. Used as a prefix for all related resources."

  validation {
    condition     = can(regex("^[a-z][a-z0-9-]{2,31}$", var.service_name))
    error_message = "service_name must be 3-32 lowercase alphanumeric characters or hyphens, starting with a letter."
  }
}

variable "environment" {
  type        = string
  description = "Deployment environment. Controls resource sizing and deletion protection."

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "environment must be one of: dev, staging, prod."
  }
}

variable "container_port" {
  type        = number
  description = "Port exposed by the container. Must be in the valid range 1-65535."
  default     = 8080

  validation {
    condition     = var.container_port >= 1 && var.container_port <= 65535
    error_message = "container_port must be between 1 and 65535."
  }
}

variable "desired_count" {
  type        = number
  description = "Desired number of running task instances."
  default     = 2

  validation {
    condition     = var.desired_count >= 1 && var.desired_count <= 100
    error_message = "desired_count must be between 1 and 100."
  }
}

variable "cpu" {
  type        = number
  description = "CPU units for the task definition (256, 512, 1024, 2048, 4096)."
  default     = 512

  validation {
    condition     = contains([256, 512, 1024, 2048, 4096], var.cpu)
    error_message = "cpu must be one of the valid Fargate CPU values: 256, 512, 1024, 2048, 4096."
  }
}

variable "memory" {
  type        = number
  description = "Memory in MiB for the task definition. Must be compatible with the selected CPU."

  validation {
    condition = (
      (var.cpu == 256  && contains([512, 1024, 2048], var.memory)) ||
      (var.cpu == 512  && var.memory >= 1024 && var.memory <= 4096  && var.memory % 1024 == 0) ||
      (var.cpu == 1024 && var.memory >= 2048 && var.memory <= 8192  && var.memory % 1024 == 0) ||
      (var.cpu == 2048 && var.memory >= 4096 && var.memory <= 16384 && var.memory % 1024 == 0) ||
      (var.cpu == 4096 && var.memory >= 8192 && var.memory <= 30720 && var.memory % 1024 == 0)
    )
    error_message = "memory value is incompatible with the selected cpu. See Fargate CPU/memory combinations."
  }
}

variable "tags" {
  type        = map(string)
  description = "Additional tags to merge onto all resources. Required keys: Team, CostCenter."
  default     = {}

  validation {
    condition     = contains(keys(var.tags), "Team") && contains(keys(var.tags), "CostCenter")
    error_message = "tags must include at least 'Team' and 'CostCenter' keys for cost allocation."
  }
}


# modules/ecs-service/main.tf (excerpt)
# Postcondition check (Terraform 1.5+): assert the service reached ACTIVE state.

resource "aws_ecs_service" "this" {
  name            = var.service_name
  cluster         = var.cluster_arn
  task_definition = aws_ecs_task_definition.this.arn
  desired_count   = var.desired_count
  launch_type     = "FARGATE"

  network_configuration {
    subnets          = var.subnet_ids
    security_groups  = [aws_security_group.service.id]
    assign_public_ip = var.environment == "dev" ? true : false
  }

  # Enable deletion protection in production
  lifecycle {
    ignore_changes = [desired_count]  # allow autoscaling to manage count

    postcondition {
      condition     = self.status == "ACTIVE"
      error_message = "ECS service ${self.name} did not reach ACTIVE status after apply."
    }
  }

  tags = merge(var.tags, {
    Name        = var.service_name
    Environment = var.environment
    ManagedBy   = "terraform"
  })
}


# modules/ecs-service/outputs.tf

output "service_name" {
  description = "Name of the created ECS service."
  value       = aws_ecs_service.this.name
}

output "service_arn" {
  description = "ARN of the created ECS service."
  value       = aws_ecs_service.this.id
}

output "security_group_id" {
  description = "ID of the service security group — use for cross-service ingress rules."
  value       = aws_security_group.service.id
}

Note

Version pinning in required_providers with ~> 5.0 allows patch updates (5.0.x → 5.1.x) but blocks major version upgrades that may introduce breaking changes. For modules published to a registry, always use semantic versioning on module releases and document the minimum compatible Terraform and provider versions in versions.tf. Consumers pin the module version with version = "~> 2.0" in their module block, not in their own required_providers.

Remote State with S3 + DynamoDB Locking

Local state is dangerous in team environments for three reasons: it cannot be shared between team members without manual file passing; it provides no locking, so two concurrent terraform apply runs can corrupt state; and it is trivially lost when a developer machine fails or a CI runner's ephemeral disk is wiped. The S3 backend with DynamoDB state locking is the standard solution for AWS-based teams: state is stored in a versioned, encrypted S3 bucket, and a DynamoDB table provides distributed mutual exclusion so that only one apply or plan can hold the lock at a time.

State isolation per environment is critical. Rather than storing all environments in a single S3 key, each environment's state lives at a distinct key path — or better, in a distinct S3 bucket entirely. Cross-stack dependencies are resolved cleanly with the terraform_remote_state data source, which reads outputs from another state file without importing or duplicating resources. Use this to pass VPC IDs, subnet ARNs, and security group IDs from a foundational networking stack into application-layer stacks.

# bootstrap/main.tf
# Create the S3 bucket and DynamoDB table for remote state.
# Run this once per AWS account before any other Terraform work.
# Note: this bootstrap stack itself uses local state — that is intentional.

provider "aws" {
  region = "eu-west-1"
}

resource "aws_s3_bucket" "tfstate" {
  bucket = "my-org-terraform-state-${data.aws_caller_identity.current.account_id}"

  lifecycle {
    prevent_destroy = true  # never allow accidental deletion of state bucket
  }

  tags = {
    Name      = "terraform-state"
    ManagedBy = "terraform"
  }
}

resource "aws_s3_bucket_versioning" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  versioning_configuration {
    status = "Enabled"  # enables state file history and rollback
  }
}

resource "aws_s3_bucket_server_side_encryption_configuration" "tfstate" {
  bucket = aws_s3_bucket.tfstate.id
  rule {
    apply_server_side_encryption_by_default {
      sse_algorithm = "aws:kms"  # encrypt at rest with KMS
    }
  }
}

resource "aws_s3_bucket_public_access_block" "tfstate" {
  bucket                  = aws_s3_bucket.tfstate.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# DynamoDB table for state locking.
# PAY_PER_REQUEST billing: no idle cost, scales automatically.
resource "aws_dynamodb_table" "tflock" {
  name         = "terraform-state-lock"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }

  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Name      = "terraform-state-lock"
    ManagedBy = "terraform"
  }
}

data "aws_caller_identity" "current" {}


# environments/prod/networking/backend.tf
# Each stack declares its own backend with a unique key path.
# Pattern: environments/<env>/<stack-name>/terraform.tfstate

terraform {
  backend "s3" {
    bucket         = "my-org-terraform-state-123456789012"
    key            = "environments/prod/networking/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"

    # Role assumed by Terraform to access the state bucket.
    # Keeps state access credentials separate from resource credentials.
    role_arn = "arn:aws:iam::123456789012:role/TerraformStateAccess"
  }
}


# environments/prod/app/main.tf
# Read VPC outputs from the networking stack via remote_state.
# No hardcoded IDs — always sourced from state.

data "terraform_remote_state" "networking" {
  backend = "s3"

  config = {
    bucket = "my-org-terraform-state-123456789012"
    key    = "environments/prod/networking/terraform.tfstate"
    region = "eu-west-1"
  }
}

module "api_service" {
  source = "../../../modules/ecs-service"

  service_name   = "api"
  environment    = "prod"
  container_port = 8080
  desired_count  = 3
  cpu            = 1024
  memory         = 2048

  # Inject networking outputs — no hardcoding, no drift
  vpc_id     = data.terraform_remote_state.networking.outputs.vpc_id
  subnet_ids = data.terraform_remote_state.networking.outputs.private_subnet_ids

  tags = {
    Team       = "platform"
    CostCenter = "cc-1234"
    Service    = "api"
  }
}

Note

When migrating from local state to S3, use terraform init -migrate-state after adding the backend block. Terraform will copy your local state to the S3 key. For moving resources between state files, use terraform state mv — it atomically removes the resource from the source state and adds it to the destination, avoiding the re-creation that would happen if you simply removed the resource from one config and added it to another. Always take a manual backup of both state files before any state mv operation.

Workspaces vs Directory-Based Environment Separation

Terraform workspaces allow a single configuration to manage multiple independent state files by switching with terraform workspace select. They look attractive for environment separation, but the HashiCorp documentation explicitly cautions against using workspaces for multi-environment deployments in production systems. The core problem is that all workspaces share the same configuration code — the only differentiation comes from terraform.workspace conditionals scattered through the code. This makes it easy to accidentally deploy production infrastructure with dev-grade sizing, and hard to apply different module versions or resource counts per environment without ugly ternary chains.

Directory-based separation gives each environment an explicit root module with its own variable files, backend config, and module version pins. Changes to the prod directory require a dedicated PR, plan, and apply — there is no risk of accidentally applying a dev change to prod. Workspaces remain appropriate for a specific narrow use case: creating ephemeral, identical copies of infrastructure for feature branches or integration test runners, where all instances genuinely share the same configuration with no per-environment variation.

Terragrunt for DRY Configurations

Terragrunt solves the directory-based approach's main weakness: repetition. Without Terragrunt, each environment directory must repeat the backend configuration block, the provider block, and the module source. With Terragrunt, a root terragrunt.hcl defines the shared backend and provider patterns once. Child terragrunt.hcl files in each environment directory inherit the root config and only override the values that differ — typically the environment name and a handful of sizing inputs.

# Root terragrunt.hcl — placed at the repository root.
# All child terragrunt.hcl files inherit from this via find_in_parent_folders().

locals {
  # Read environment name from the directory structure.
  # Convention: environments/<env>/<stack>/terragrunt.hcl
  env = basename(dirname(get_terragrunt_dir()))

  account_ids = {
    dev     = "111111111111"
    staging = "222222222222"
    prod    = "333333333333"
  }

  account_id = local.account_ids[local.env]
}

# Generate the backend configuration dynamically.
# Each stack gets a unique S3 key based on its path.
remote_state {
  backend = "s3"

  generate = {
    path      = "backend.tf"
    if_exists = "overwrite_terragrunt"
  }

  config = {
    bucket         = "my-org-terraform-state-${local.account_id}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = "eu-west-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    role_arn       = "arn:aws:iam::${local.account_id}:role/TerraformStateAccess"
  }
}

# Generate the provider configuration dynamically.
generate "provider" {
  path      = "provider.tf"
  if_exists = "overwrite_terragrunt"

  contents = <<EOF
provider "aws" {
  region = "eu-west-1"

  assume_role {
    role_arn = "arn:aws:iam::${local.account_id}:role/TerraformDeployRole"
  }

  default_tags {
    tags = {
      Environment = "${local.env}"
      ManagedBy   = "terraform"
      Repository  = "my-org/infra"
    }
  }
}
EOF
}


# environments/prod/ecs-api/terragrunt.hcl
# Child config: inherits root, overrides only environment-specific values.

include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "git::https://github.com/my-org/infra.git//modules/ecs-service?ref=v2.3.1"
}

# Read shared inputs from a common env-level config file.
dependency "networking" {
  config_path = "../networking"

  mock_outputs = {
    vpc_id             = "vpc-00000000"
    private_subnet_ids = ["subnet-00000000", "subnet-11111111"]
  }
  mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}

inputs = {
  service_name   = "api"
  environment    = "prod"
  container_port = 8080
  desired_count  = 3
  cpu            = 1024
  memory         = 2048

  vpc_id     = dependency.networking.outputs.vpc_id
  subnet_ids = dependency.networking.outputs.private_subnet_ids

  tags = {
    Team       = "platform"
    CostCenter = "cc-1234"
  }
}

Multi-Account AWS Infrastructure with IAM Role Assumption

AWS Organizations enables a multi-account structure where each environment (or domain) runs in a dedicated account. A typical organization layout separates concerns into: a management account for billing and SCPs; a network account for shared Transit Gateway and Direct Connect; a security account for centralized CloudTrail, GuardDuty, and Security Hub; a shared-services account for ECR, Artifactory, and internal tooling; and individual workload accounts for prod, staging, and dev. Each account boundary limits the blast radius of a compromised credential or a runaway Terraform destroy.

Terraform crosses account boundaries using provider aliases with the assume_role block. A single deployment role in the management or CI account assumes a cross-account role in each target account — the target account roles grant only the permissions required for that specific stack, not broad AdministratorAccess. The aws_caller_identity data source combined with a validation check ensures the deploy is targeting the expected account, preventing misrouted deploys.

# multi-account/main.tf
# Deploy resources into two different AWS accounts in a single Terraform config.
# Each provider alias assumes the appropriate cross-account role.

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

# Default provider: shared-services account (where Terraform CI runs)
provider "aws" {
  region = "eu-west-1"
  alias  = "shared_services"
}

# Cross-account provider: prod account
provider "aws" {
  region = "eu-west-1"
  alias  = "prod"

  assume_role {
    role_arn     = "arn:aws:iam::333333333333:role/TerraformDeployRole"
    session_name = "TerraformDeploy-${formatdate("YYYYMMDD-hhmmss", timestamp())}"
    # Optional: pass tags to the assumed-role session for audit trail
    tags = {
      ManagedBy = "terraform"
      Workflow  = "deploy"
    }
  }
}

# Verify we are targeting the expected account before proceeding.
# This data source is evaluated at plan time — wrong account fails fast.
data "aws_caller_identity" "prod" {
  provider = aws.prod
}

locals {
  expected_prod_account = "333333333333"
}

resource "null_resource" "account_guard" {
  lifecycle {
    precondition {
      condition     = data.aws_caller_identity.prod.account_id == local.expected_prod_account
      error_message = "Provider 'prod' resolved to account ${data.aws_caller_identity.prod.account_id}, expected ${local.expected_prod_account}. Check role ARN."
    }
  }
}

# Deploy an ECR repository in shared-services, accessible from prod.
resource "aws_ecr_repository" "api" {
  provider             = aws.shared_services
  name                 = "my-org/api"
  image_tag_mutability = "IMMUTABLE"

  image_scanning_configuration {
    scan_on_push = true
  }
}

# Grant prod account pull access to the shared ECR repository.
resource "aws_ecr_repository_policy" "api_cross_account" {
  provider   = aws.shared_services
  repository = aws_ecr_repository.api.name

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Sid    = "AllowProdAccountPull"
        Effect = "Allow"
        Principal = {
          AWS = "arn:aws:iam::333333333333:root"
        }
        Action = [
          "ecr:BatchGetImage",
          "ecr:GetDownloadUrlForLayer",
          "ecr:GetAuthorizationToken",
        ]
      }
    ]
  })
}

# Deploy ECS service in the prod account referencing the shared ECR image.
module "api_service" {
  source    = "../../modules/ecs-service"
  providers = { aws = aws.prod }  # pass the aliased provider into the module

  service_name   = "api"
  environment    = "prod"
  container_port = 8080
  desired_count  = 3
  cpu            = 1024
  memory         = 2048

  # Reference the ECR URI from shared-services account
  container_image = "${aws_ecr_repository.api.repository_url}:latest"

  tags = {
    Team       = "platform"
    CostCenter = "cc-1234"
  }
}

Note

Cross-account role assumption requires the target account's TerraformDeployRoleto have a trust policy allowing the CI account's deploy role (or EC2 instance profile) to assume it. Scope the permissions on the assumed role to the minimum required for the specific stack — use AWS IAM Access Analyzer to generate least-privilege policies from CloudTrail. AvoidAdministratorAccess on the assumed role even in non-production accounts; it trains teams to expect broad permissions and makes eventual tightening disruptive.

Drift Detection and Automated Remediation

Drift occurs when the actual state of infrastructure diverges from the Terraform state file — a resource is modified or deleted outside of Terraform (manually in the console, by an auto-remediation script, or by an AWS service action). Undetected drift causes apply failures, unexpected plan diffs, and silent security regressions. A common scenario: a security engineer manually tightens a security group rule in response to an incident, then Terraform reverts it on the next deployment.

terraform plan with the -detailed-exitcode flag returns exit code 2 when there are changes to apply (including drift), 0 when there are no changes, and 1 on error. This makes it scriptable in CI. A periodic drift detection pipeline runs a plan against each stack and alerts on exit code 2 — without applying. Terraform 1.5+ introduced import blocks as a declarative alternative to the imperative terraform import command, making it straightforward to bring unmanaged resources discovered during drift detection under IaC control without resource recreation.

# .github/workflows/terraform-drift-detection.yml
# Runs on a schedule to detect drift across all production stacks.
# Opens a GitHub issue when drift is found; closes it when drift is resolved.

name: Terraform Drift Detection

on:
  schedule:
    - cron: '0 6 * * 1-5'   # weekdays at 06:00 UTC, before business hours
  workflow_dispatch:          # allow manual trigger

jobs:
  detect-drift:
    name: Detect Drift — ${{ matrix.stack }}
    runs-on: ubuntu-latest
    timeout-minutes: 20

    strategy:
      fail-fast: false        # check all stacks even if one fails
      matrix:
        stack:
          - environments/prod/networking
          - environments/prod/ecs-api
          - environments/prod/rds

    permissions:
      id-token: write   # for OIDC token (no long-lived credentials)
      contents: read
      issues: write     # for opening/closing drift issues

    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials via OIDC
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789012:role/GitHubActionsDeployRole
          aws-region: eu-west-1

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: "~1.8"

      - name: Terraform Init
        working-directory: ${{ matrix.stack }}
        run: terraform init -input=false

      - name: Detect Drift
        id: plan
        working-directory: ${{ matrix.stack }}
        # -detailed-exitcode: 0=no changes, 1=error, 2=changes present (drift)
        run: |
          set +e
          terraform plan -detailed-exitcode -input=false -out=tfplan 2>&1 | tee plan_output.txt
          echo "exit_code=$?" >> "$GITHUB_OUTPUT"
          set -e

      - name: Open Issue on Drift Detected
        if: steps.plan.outputs.exit_code == '2'
        uses: actions/github-script@v7
        with:
          script: |
            const stack = '${{ matrix.stack }}';
            const body = [
              '## Terraform Drift Detected',
              '',
              `**Stack:** ${stack}`,
              `**Run:** ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}`,
              '',
              'Terraform detected that the actual infrastructure state has diverged',
              'from the state file. Review the plan output in the workflow run above',
              'and either apply the plan to revert drift, or import the manual changes',
              'into Terraform state.',
            ].join('\n');
            await github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: `[Drift] ${stack}`,
              body,
              labels: ['terraform-drift', 'infrastructure'],
            });


# Import block (Terraform 1.5+) — bring an unmanaged S3 bucket under IaC.
# Add this block to main.tf, run terraform plan, then terraform apply.
# After successful import, remove the import block — it is a one-time operation.

import {
  to = aws_s3_bucket.legacy_data
  id = "my-org-legacy-data-bucket"
}

resource "aws_s3_bucket" "legacy_data" {
  bucket = "my-org-legacy-data-bucket"

  lifecycle {
    prevent_destroy = true
  }

  tags = {
    Name        = "legacy-data"
    Environment = "prod"
    ManagedBy   = "terraform"
    Team        = "data-platform"
    CostCenter  = "cc-5678"
  }
}

CI/CD with Atlantis and Automated Plan/Apply

Atlantis is an open-source Terraform automation server that integrates with GitHub, GitLab, and Bitbucket pull requests. When a PR modifies Terraform files, Atlantis automatically runs terraform plan and posts the output as a PR comment. An approved reviewer then comments atlantis apply to apply the plan to the target environment — the merge only completes after a successful apply. This workflow makes pull requests the authoritative approval mechanism for all infrastructure changes, creating a natural audit trail of who approved what, when, and what the plan showed.

Atlantis enforces workspace-level locking: once a plan has been run for a project, that project is locked until the plan is applied or discarded. This prevents two PRs from generating conflicting plans against the same state. Per-project configuration in atlantis.yaml lets teams define custom workflow steps (e.g., running terragrunt instead of raw terraform), set apply requirements (minimum number of approvals, passing CI checks), and scope which directories trigger plans for which projects.

# atlantis.yaml — place at repository root.
# Defines projects, workflows, and apply requirements for all stacks.

version: 3

# Global apply requirements: require one approval and passing CI before apply.
# Override per-project for prod stacks to require two approvals.
apply_requirements: [approved, mergeable]

# Allow plans but not applies on PRs from external contributors.
allowed_regexp_prefixes:
  - feature/
  - fix/
  - chore/

projects:
  - name: prod-networking
    dir: environments/prod/networking
    workspace: default
    terraform_version: v1.8.0
    apply_requirements: [approved, mergeable]
    workflow: terragrunt

  - name: prod-ecs-api
    dir: environments/prod/ecs-api
    workspace: default
    terraform_version: v1.8.0
    # Prod services require two approvals from the platform team
    apply_requirements: [approved, mergeable]
    workflow: terragrunt

  - name: staging-ecs-api
    dir: environments/staging/ecs-api
    workspace: default
    terraform_version: v1.8.0
    apply_requirements: [mergeable]   # staging: no approval gate
    workflow: terragrunt

  - name: dev-ecs-api
    dir: environments/dev/ecs-api
    workspace: default
    terraform_version: v1.8.0
    apply_requirements: []            # dev: apply immediately after plan
    workflow: terragrunt
    autoplan:
      when_modified:
        - "*.hcl"
        - "*.tf"
        - "../../../modules/ecs-service/**/*.tf"
      enabled: true

workflows:
  terragrunt:
    plan:
      steps:
        - env:
            name: TERRAGRUNT_LOG_LEVEL
            value: warn
        - run: terragrunt init -input=false
        - run: terragrunt plan -input=false -out=$PLANFILE
    apply:
      steps:
        - run: terragrunt apply -input=false $PLANFILE
    policy_check:
      steps:
        - run: terragrunt plan -input=false -out=$PLANFILE

  # Default workflow used when 'workflow:' is not specified
  default:
    plan:
      steps:
        - init:
            extra_args: ["-input=false", "-upgrade"]
        - plan:
            extra_args: ["-input=false"]
    apply:
      steps:
        - apply:
            extra_args: ["-input=false"]

Note

Atlantis requires a long-lived server process (typically deployed on ECS, Kubernetes, or EC2) with network access to your Git provider's webhook endpoint. The Atlantis server holds the IAM credentials for deploying to all target accounts — use an IAM role attached to the host (EC2 instance profile or ECS task role) rather than static access keys. Configure webhook secrets to authenticate incoming PR events. For Terragrunt repos, install Terragrunt on the Atlantis server image or use the runatlantis/atlantis Docker image with Terragrunt added via a custom Dockerfile.

Further Reading

  • Terraform Documentation — complete reference for language syntax, backend configuration, provider development, the module registry protocol, and the Terraform Cloud API
  • Developing Modules — HashiCorp's official guide to module structure, input validation, output design, and publishing modules to the public or private Terraform Registry
  • Terragrunt Documentation — full reference for root and child terragrunt.hcl configuration, dependency blocks, mock outputs, run-all commands, hooks, and the built-in helper functions
  • Atlantis Documentation — server installation, GitHub/GitLab/Bitbucket webhook setup, atlantis.yaml workflow reference, access control, policy checking integration with OPA and Conftest, and Slack/PagerDuty notifications
  • terraform-aws-modules (Gruntwork) — production-grade AWS module library from Gruntwork covering VPC, ECS, RDS, Lambda, S3, and IAM with extensive documentation and usage examples

Work with us

Managing Terraform at scale and hitting friction with copy-paste modules, state conflicts, or multi-account deployments?

We design and implement production-grade Terraform architectures — from reusable module libraries with validation and version pinning to S3+DynamoDB remote state, Terragrunt DRY configurations, multi-account AWS with IAM role assumption, drift detection pipelines, and Atlantis CI/CD workflows. Let’s talk.

Get in touch

Related Articles

DataSOps Consulting

Need help implementing this in production?

We build and operate data pipelines, AI systems, and observability stacks for engineering teams. Reach out for a free 30-minute architecture review.