How a Multi-AWS account helps to prevent accidental damage when using Terragrunt
Confession time. I've been ignoring the suggestion of having multiple AWS accounts for years through AWS Organisations.
It seems I'm not the only one avoiding the leap:
"In the last few years...one thing we still find from time to time is customers running a single AWS account containing all their development and staging environments alongside their production workload." - John Topper of Scale Factory
But is having multiple environments in one account all that bad:
"It may surprise you to hear that it’s entirely possible to run a perfectly secure, mature, and complex set of AWS resources out of a single AWS account. Yes, it’s true! IAM policies are infinitely customizable, and access can be controlled at an insanely granular level. Tags can be enforced, and single-account service limits can (usually) be increased." - David Blocher of A Cloud Guru
So why I have made a U-turn on developing multiple AWS accounts?
David Blocher's article explains the inherent risks to security and the complexity of managing IAM credentials within a single account. This is very true as the development team grow bigger.
For me, however, his point about limiting the blast radius is what I'll briefly touch on today; not from a security perspective per se, but from the perspective of preventing accidental damage when managing infrastructure internally.
Why is accidental damage an issue with a single AWS account?
With a growing tech team and more people needing to test infrastructure, it's not very safe to allow automated tools like Terragrunt to run riot in an environment where production also exists. Having somewhere completely isolated to test first is beneficial to prevent accidents from occurring.
"As your developers are building things, it’s pretty likely they’ll be building and destroying resources multiple times a day as they figure out what the solution will ultimately look like." - John Topper of Scale Factory
The clash of the regions
One way to test infrastructure in a single account is by using an unused region.
It works well until it doesn't.
I recently was testing a Terraform VPC module with Terragrunt. I didn't appreciate that it would use my default region to apply the root VPC without declaring the provider resource. Because of this, I launched my new VPC wrongly within the eu-west-1
region, yet, the subnets were launched in the intended us-east-1
region through a Terraform variable.
Thankfully, this was done in a separate account. But it goes to show a simple infrastructure test can have consequences on key environments despite best intentions.
The clash of the VPCs
Another way to manage environments within a single account is to use separate VPCs. By default, VPCs cannot communicate with each other. The networks are isolated, and they have their own sets of security groups for access.
However, you have to be aware of overlapping CIDRs again. While you can create multiple VPCs with the same CIDRs and AWS allows it, you can never peer these networks together. Chances are, if you're using it for separate environments anyway, you may never want to peer them, but you never know when a requirement comes up, even temporarily to connect VPCs.
Managing least-privileges
There are a few recommended ways of managing privileges in AWS. One is to start writing a policy with least privilege manually, and grant permissions when and as you need them during development. However, manually doing this can be time-costly. You can spend ages working out what custom policies to create.
Alternatively, you can create a fully privileged IAM user, do all the work you need and then run the IAM Access Analyzer to determine the permissions for you. Of course, doing this in a production account is risky at best! But with a sandboxed developer account, you can test this all day long and then narrow down the secrets to what you need without manually configuring policies from scratch.
Conclusion
With a small team, or a single Ops engineer managing things, multiple accounts seem like an unnecessary headache. However, if your team is growing and you're wondering how to test automation tasks safely, I highly recommend at least one other sandbox or test environment account for that before moving anything to production for a peaceful life.