There's a moment in every cloud engineer's journey when infrastructure stops being a collection of resources you click together and becomes code you version, test, and deploy. For me, that moment came when I had to recreate a production environment after a catastrophic failure—and realized I had no idea what all the settings were.
That was my wake-up call. Since then, I've become a zealot for Infrastructure as Code (IaC), particularly AWS CloudFormation.
Why Infrastructure as Code?
Let me be clear: if you're still provisioning infrastructure through web consoles, you're doing yourself a disservice. Here's why:
1. Reproducibility
Can you spin up an identical copy of your production environment in 15 minutes? With IaC, you can. Just run the same template in a different region or account.
2. Version Control
Your infrastructure changes should be tracked just like your code. Git history shows you exactly who changed what and when.
3. Documentation
Your CloudFormation templates ARE your documentation. They're always up-to-date because they're the source of truth.
4. Testing
You can validate templates before applying them, preventing costly mistakes in production.
5. Disaster Recovery
When (not if) something goes wrong, you can recreate your entire infrastructure quickly and reliably.
CloudFormation vs. Terraform: The Debate
People always ask: "Why CloudFormation instead of Terraform?" Here's my take:
Choose CloudFormation when:
- You're all-in on AWS
- You want native integration with AWS services
- You need drift detection built-in
- You don't want to manage additional state
Choose Terraform when:
- You're multi-cloud
- You need more flexible programming capabilities
- You want a larger ecosystem of providers
I use both, but for pure AWS projects, CloudFormation is my go-to.
Getting Started: Your First Template
Let's build something practical—a simple web application stack:
AWSTemplateFormatVersion: '2010-09-09'
Description: Simple web application infrastructure
Parameters:
EnvironmentName:
Type: String
Default: dev
AllowedValues:
- dev
- staging
- prod
Description: Environment name
Resources:
# VPC
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
Tags:
- Key: Name
Value: !Sub ${EnvironmentName}-vpc
# Public Subnet
PublicSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub ${EnvironmentName}-public-subnet
# Application Load Balancer
LoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: !Sub ${EnvironmentName}-alb
Subnets:
- !Ref PublicSubnet
SecurityGroups:
- !Ref LoadBalancerSecurityGroup
# Security Group
LoadBalancerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for load balancer
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
Outputs:
LoadBalancerDNS:
Description: DNS name of the load balancer
Value: !GetAtt LoadBalancer.DNSName
Export:
Name: !Sub ${EnvironmentName}-alb-dns
Best Practices I've Learned
1. Use Parameters Wisely
Make your templates reusable across environments, but don't over-parameterize. Too many parameters make templates hard to use.
2. Leverage Outputs and Exports
Use stack outputs to share values between stacks:
Outputs:
VPCId:
Value: !Ref VPC
Export:
Name: !Sub ${EnvironmentName}-vpc-id
# In another stack:
VpcId: !ImportValue dev-vpc-id
3. Organize with Nested Stacks
Break large templates into logical components:
- network-stack.yaml (VPC, subnets, routing)
- security-stack.yaml (security groups, IAM roles)
- application-stack.yaml (EC2, ECS, Lambda)
- database-stack.yaml (RDS, DynamoDB)
4. Use Stack Sets for Multi-Account
Managing the same infrastructure across multiple AWS accounts? Stack Sets are your friend.
5. Implement Change Sets
Always preview changes before applying them:
aws cloudformation create-change-set \
--stack-name my-stack \
--change-set-name my-change-set \
--template-body file://template.yaml
aws cloudformation describe-change-set \
--change-set-name my-change-set \
--stack-name my-stack
Advanced Patterns
Custom Resources
Sometimes you need to do things CloudFormation doesn't support natively. Custom resources backed by Lambda functions are the answer:
CustomResource:
Type: Custom::MyCustomResource
Properties:
ServiceToken: !GetAtt CustomResourceFunction.Arn
SomeProperty: SomeValue
Macros and Transforms
Use AWS SAM transform for serverless applications:
Transform: AWS::Serverless-2016-10-31
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Handler: index.handler
Runtime: nodejs18.x
Events:
ApiEvent:
Type: Api
Properties:
Path: /hello
Method: get
Common Pitfalls
1. Circular Dependencies
CloudFormation can't resolve circular references. If you hit this, you need to split into separate stacks.
2. Resource Limits
Templates have a 500-resource limit. Use nested stacks to work around this.
3. Replacement vs. Update
Some resource changes cause replacement (deletion + recreation). This can be destructive. Always check the documentation.
4. Rollback Failures
When a stack gets stuck in ROLLBACK_FAILED, you often need to manually fix resources before CloudFormation can continue.
Real-World Example: Production Infrastructure
Here's how I structured a recent production deployment:
- foundation-stack: VPC, subnets, NAT gateways, VPN
- security-stack: IAM roles, security groups, KMS keys
- database-stack: RDS cluster, backup configuration
- application-stack: ECS cluster, task definitions, ALB
- monitoring-stack: CloudWatch dashboards, alarms, SNS topics
- pipeline-stack: CodePipeline, CodeBuild for CI/CD
Each stack is independently deployable and can be updated without affecting others (mostly).
Automation and CI/CD
Your IaC should be in your CI/CD pipeline:
# .github/workflows/deploy-infrastructure.yml
name: Deploy Infrastructure
on:
push:
branches: [main]
paths:
- 'infrastructure/**'
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Deploy CloudFormation stack
run: |
aws cloudformation deploy \
--template-file infrastructure/template.yaml \
--stack-name my-infrastructure \
--capabilities CAPABILITY_IAM \
--parameter-overrides Environment=prod
Testing Your Infrastructure
Yes, you should test your IaC:
- cfn-lint: Static analysis of templates
- TaskCat: Multi-region testing
- InSpec: Compliance testing
- Integration tests: Deploy to test account, verify resources
Key Takeaways
- Treat infrastructure like code—version it, test it, review it
- Start small, then expand
- Use parameters for reusability, but don't overdo it
- Break large templates into logical stacks
- Always use change sets to preview changes
- Automate deployment through CI/CD
- Document your templates (they're self-documenting, but comments help)
Infrastructure as Code isn't just a best practice—it's the only sustainable way to manage cloud infrastructure at scale. The initial investment in learning pays dividends in reliability, speed, and peace of mind.
What's your IaC journey been like? Share your experiences with me!