Infrastructure as Code: My CloudFormation Journey

There's a moment in every cloud engineer's journey when infrastructure stops being a collection of resources you click together and becomes code you version, test, and deploy. For me, that moment came when I had to recreate a production environment after a catastrophic failure—and realized I had no idea what all the settings were.

That was my wake-up call. Since then, I've become a zealot for Infrastructure as Code (IaC), particularly AWS CloudFormation.

Why Infrastructure as Code?

Let me be clear: if you're still provisioning infrastructure through web consoles, you're doing yourself a disservice. Here's why:

1. Reproducibility

Can you spin up an identical copy of your production environment in 15 minutes? With IaC, you can. Just run the same template in a different region or account.

2. Version Control

Your infrastructure changes should be tracked just like your code. Git history shows you exactly who changed what and when.

3. Documentation

Your CloudFormation templates ARE your documentation. They're always up-to-date because they're the source of truth.

4. Testing

You can validate templates before applying them, preventing costly mistakes in production.

5. Disaster Recovery

When (not if) something goes wrong, you can recreate your entire infrastructure quickly and reliably.

CloudFormation vs. Terraform: The Debate

People always ask: "Why CloudFormation instead of Terraform?" Here's my take:

Choose CloudFormation when:

  • You're all-in on AWS
  • You want native integration with AWS services
  • You need drift detection built-in
  • You don't want to manage additional state

Choose Terraform when:

  • You're multi-cloud
  • You need more flexible programming capabilities
  • You want a larger ecosystem of providers

I use both, but for pure AWS projects, CloudFormation is my go-to.

Getting Started: Your First Template

Let's build something practical—a simple web application stack:

AWSTemplateFormatVersion: '2010-09-09'
Description: Simple web application infrastructure

Parameters:
  EnvironmentName:
    Type: String
    Default: dev
    AllowedValues:
      - dev
      - staging
      - prod
    Description: Environment name

Resources:
  # VPC
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      Tags:
        - Key: Name
          Value: !Sub ${EnvironmentName}-vpc

  # Public Subnet
  PublicSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']
      MapPublicIpOnLaunch: true
      Tags:
        - Key: Name
          Value: !Sub ${EnvironmentName}-public-subnet

  # Application Load Balancer
  LoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Name: !Sub ${EnvironmentName}-alb
      Subnets:
        - !Ref PublicSubnet
      SecurityGroups:
        - !Ref LoadBalancerSecurityGroup

  # Security Group
  LoadBalancerSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for load balancer
      VpcId: !Ref VPC
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          CidrIp: 0.0.0.0/0

Outputs:
  LoadBalancerDNS:
    Description: DNS name of the load balancer
    Value: !GetAtt LoadBalancer.DNSName
    Export:
      Name: !Sub ${EnvironmentName}-alb-dns

Best Practices I've Learned

1. Use Parameters Wisely

Make your templates reusable across environments, but don't over-parameterize. Too many parameters make templates hard to use.

2. Leverage Outputs and Exports

Use stack outputs to share values between stacks:

Outputs:
  VPCId:
    Value: !Ref VPC
    Export:
      Name: !Sub ${EnvironmentName}-vpc-id

# In another stack:
VpcId: !ImportValue dev-vpc-id

3. Organize with Nested Stacks

Break large templates into logical components:

  • network-stack.yaml (VPC, subnets, routing)
  • security-stack.yaml (security groups, IAM roles)
  • application-stack.yaml (EC2, ECS, Lambda)
  • database-stack.yaml (RDS, DynamoDB)

4. Use Stack Sets for Multi-Account

Managing the same infrastructure across multiple AWS accounts? Stack Sets are your friend.

5. Implement Change Sets

Always preview changes before applying them:

aws cloudformation create-change-set \
  --stack-name my-stack \
  --change-set-name my-change-set \
  --template-body file://template.yaml

aws cloudformation describe-change-set \
  --change-set-name my-change-set \
  --stack-name my-stack

Advanced Patterns

Custom Resources

Sometimes you need to do things CloudFormation doesn't support natively. Custom resources backed by Lambda functions are the answer:

CustomResource:
  Type: Custom::MyCustomResource
  Properties:
    ServiceToken: !GetAtt CustomResourceFunction.Arn
    SomeProperty: SomeValue

Macros and Transforms

Use AWS SAM transform for serverless applications:

Transform: AWS::Serverless-2016-10-31

Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Handler: index.handler
      Runtime: nodejs18.x
      Events:
        ApiEvent:
          Type: Api
          Properties:
            Path: /hello
            Method: get

Common Pitfalls

1. Circular Dependencies

CloudFormation can't resolve circular references. If you hit this, you need to split into separate stacks.

2. Resource Limits

Templates have a 500-resource limit. Use nested stacks to work around this.

3. Replacement vs. Update

Some resource changes cause replacement (deletion + recreation). This can be destructive. Always check the documentation.

4. Rollback Failures

When a stack gets stuck in ROLLBACK_FAILED, you often need to manually fix resources before CloudFormation can continue.

Real-World Example: Production Infrastructure

Here's how I structured a recent production deployment:

  1. foundation-stack: VPC, subnets, NAT gateways, VPN
  2. security-stack: IAM roles, security groups, KMS keys
  3. database-stack: RDS cluster, backup configuration
  4. application-stack: ECS cluster, task definitions, ALB
  5. monitoring-stack: CloudWatch dashboards, alarms, SNS topics
  6. pipeline-stack: CodePipeline, CodeBuild for CI/CD

Each stack is independently deployable and can be updated without affecting others (mostly).

Automation and CI/CD

Your IaC should be in your CI/CD pipeline:

# .github/workflows/deploy-infrastructure.yml
name: Deploy Infrastructure
on:
  push:
    branches: [main]
    paths:
      - 'infrastructure/**'

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v1
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1
      
      - name: Deploy CloudFormation stack
        run: |
          aws cloudformation deploy \
            --template-file infrastructure/template.yaml \
            --stack-name my-infrastructure \
            --capabilities CAPABILITY_IAM \
            --parameter-overrides Environment=prod

Testing Your Infrastructure

Yes, you should test your IaC:

  • cfn-lint: Static analysis of templates
  • TaskCat: Multi-region testing
  • InSpec: Compliance testing
  • Integration tests: Deploy to test account, verify resources

Key Takeaways

  • Treat infrastructure like code—version it, test it, review it
  • Start small, then expand
  • Use parameters for reusability, but don't overdo it
  • Break large templates into logical stacks
  • Always use change sets to preview changes
  • Automate deployment through CI/CD
  • Document your templates (they're self-documenting, but comments help)

Infrastructure as Code isn't just a best practice—it's the only sustainable way to manage cloud infrastructure at scale. The initial investment in learning pays dividends in reliability, speed, and peace of mind.

What's your IaC journey been like? Share your experiences with me!