Cloud Security Lab Series Part 3: Infrastructure as Code(IaC) using Terraform Edition

How I transformed my manually built AWS cloud security lab into a reproducible, secure, and scalable Infrastructure as Code project.

Introduction

In my previous Cloud Security Lab articles, I manually built an AWS environment to explore cloud attacks, including IAM exploitation, metadata abuse, privilege escalation, and CloudTrail analysis. While building the lab manually helped me understand how each AWS service worked, I quickly realized that recreating the environment from scratch would be time-consuming and prone to mistakes.

Modern cloud environments aren't managed manually; they're managed as code.

To make my lab reproducible, scalable, and easier to maintain, I rebuilt the entire infrastructure using Terraform. Instead of configuring resources through the AWS Management Console, every component of the lab is now defined in code and can be recreated with just a few Terraform commands.

In this article, I'll walk through how I designed the infrastructure, the challenges I encountered, and the security principles that guided the architecture.

What This Lab Builds

This Terraform project automatically deploys:

Custom AWS VPC
Public and Private Subnets
Internet Gateway
Route Tables and Associations
Security Groups
Bastion Host
Private EC2 Instance
Dynamic Ubuntu AMI Selection
SSH Key Pair Integration
Terraform Outputs
SSH Agent Forwarding

The final environment recreates the same cloud infrastructure used throughout my Cloud Security Lab Series, but this time using Infrastructure as Code.

Why Infrastructure as Code?

Terraform isn't just about automation — it's about consistency.

When infrastructure is created manually, small configuration mistakes can introduce security risks or make environments difficult to reproduce. Infrastructure as Code solves this by defining the entire environment in code, making deployments repeatable, version-controlled, and easier to audit.

For a cloud security lab, this means I can quickly rebuild the environment whenever I need to perform new attack simulations or defensive exercises without repeating every manual configuration step.

Architecture Overview

Internet │ Internet Gateway │ Public Route Table │ Public Subnet │ Bastion Host │ SSH Agent Forwarding │ Private Subnet │ Private EC2

The Bastion Host is the only system exposed to the Internet. The private EC2 instance has no public IP address and can only be accessed through the Bastion Host using SSH Agent Forwarding, reducing the attack surface while maintaining secure administrative access.

Skills Demonstrated

_- Terraform

Infrastructure as Code (IaC)
AWS VPC Networking
EC2 Deployment
Security Groups
Bastion Host Architecture
SSH Agent Forwarding
AWS CLI
Linux
Cloud Security Design
Infrastructure Troubleshooting_

Designing the Architecture

Before writing a single line of Terraform code, I spent time designing the architecture I wanted to build.

Since this lab serves as the foundation for future cloud attack simulations and SOC investigations, my objective wasn't simply to deploy AWS resources; it was to build an environment that followed realistic security principles while remaining simple enough to understand and extend.

Instead of placing every resource in a single network, I divided the environment into two logical layers:

A public subnet that exposes only the Bastion Host to the Internet.
A private subnet that contains internal systems that should never be directly accessible from outside the VPC.

This separation reduces the attack surface while closely resembling how many production cloud environments are designed.

Virtual Private Cloud (VPC)

The VPC acts as the logical boundary for the entire lab.

Every resource, including subnets, route tables, security groups, and EC2 instances, exists inside this isolated network.

For this project, I created a dedicated VPC using the CIDR block:

10.10.0.0/16

This address space provides enough room for future expansion while keeping the network easy to understand.

VPC Design VPC (10.10.0.0/16)

├── Public Subnet │ └── Bastion Host │ └── Private Subnet └── Private EC2

Public and Private Subnets

The VPC was divided into two separate subnets.

Public Subnet

The public subnet hosts the Bastion Host.

This subnet is associated with a route table that sends Internet-bound traffic through an Internet Gateway, allowing administrators to securely connect to the Bastion Host using SSH.

No other systems are publicly accessible.

Private Subnet

The private subnet hosts the internal EC2 instance.

Unlike the Bastion Host, this server has no public IP address and cannot be accessed directly from the Internet.

Instead, administrative access is only possible after successfully authenticating to the Bastion Host.

This significantly reduces the attack surface while maintaining secure management access.

Public vs Private Subnet

Laptop

↓

Internet

↓

Internet Gateway

↓

Bastion Host

↓

SSH Agent Forwarding

↓

Private EC2

Security Groups

Rather than exposing every server to the Internet, I used Security Groups to enforce the principle of least privilege.

The Bastion Host only allows SSH access from my own public IP address.

The private EC2 instance does not accept SSH traffic from the Internet. Instead, it only allows SSH connections originating from the Bastion Host's Security Group.

This creates a layered access model where every connection must pass through a controlled entry point before reaching internal resources.

Administrative Access

One design decision I particularly wanted to implement was secure administrative access.

Instead of copying SSH private keys onto the Bastion Host, I configured SSH Agent Forwarding.

This allowed me to authenticate to the private EC2 instance while keeping my private key stored securely on my local machine.

Although modern production environments often use services such as AWS Systems Manager Session Manager, implementing SSH Agent Forwarding helped me better understand secure Linux administration and Bastion Host workflows.

Why This Design?

Every component in this architecture serves a specific security purpose.

The VPC provides network isolation.
Public and private subnets separate Internet-facing and internal resources.
Security Groups restrict unnecessary access.
The Bastion Host acts as the only administrative entry point.
SSH Agent Forwarding eliminates the need to store private keys on intermediary systems.

Rather than simply deploying AWS resources, the objective was to build an environment that could support future attack simulations, detection engineering, and SOC investigations while following security best practices from the beginning.

Organizing the Terraform Project

One lesson I learned early while working with Terraform is that infrastructure can become difficult to manage if everything is placed inside a single main.tf file.

Although that approach may work for small demonstrations, it quickly becomes difficult to navigate as the environment grows. Since my goal was to build a cloud security lab that could continue evolving with future articles, I decided to organize the project into separate Terraform files based on their responsibilities.

The final project structure looked like this:

terraform-cloud-security-lab/ │ ├── providers.tf ├── variables.tf ├── terraform.tfvars ├── networking.tf ├── security.tf ├── compute.tf ├── outputs.tf ├── .gitignore └── terraform.tfstate

Terraform Project Structure

Separating the configuration into multiple files made the project easier to understand and maintain. Instead of searching through hundreds of lines of code, each file focused on a single part of the infrastructure.

providers.tf

The project begins by configuring the AWS provider.

This file tells Terraform which cloud provider to communicate with and which AWS region should be used when creating resources.

Rather than hardcoding values throughout the project, I later replaced static configuration with variables, making the deployment easier to customize for future environments.

variables.tf

Instead of embedding configuration values directly into the Terraform code, I declared reusable variables for items such as the AWS region and EC2 instance type.

This allowed me to separate the infrastructure logic from the deployment configuration.

As the project grows, this approach makes it much easier to reuse the same Terraform code across different environments.

terraform.tfvars

The actual values for the declared variables are stored inside terraform.tfvars.

Keeping configuration separate from the infrastructure code means that changing the deployment no longer requires editing multiple Terraform files. Instead, updating a few variable values is often enough to provision a different environment.

networking.tf

This file contains the networking foundation of the lab.

It provisions:

VPC
Public subnet
Private subnet
Internet Gateway
Route Tables
Route Table Associations

Building the networking layer first ensured that every resource created afterward had a secure and well-defined environment to operate within.

security.tf

Security-related resources are grouped together inside a dedicated file.

This includes:

Security Groups
SSH Key Pair

Separating security controls from compute resources makes the overall project easier to audit and modify as additional defensive controls are introduced.

compute.tf

Once the networking and security layers were complete, I deployed the compute resources.

This file provisions:

Bastion Host
Private EC2 Instance

Instead of manually selecting an AMI, Terraform dynamically retrieves the latest supported Ubuntu image during deployment. This keeps the lab current without requiring manual updates whenever a new Ubuntu image is released.

outputs.tf

Finally, I used Terraform Outputs to display important deployment information after the infrastructure was created.

Rather than manually searching through the AWS Console, Terraform immediately displays useful values such as:

**- Bastion Host Public IP

Private EC2 Private IP**

These outputs made it significantly easier to connect to the environment after each deployment.

Why Organize the Project This Way?

As I continued building the lab, I realized that Infrastructure as Code is not just about automation; it's also about maintainability.

Breaking the project into logical components makes the configuration easier to read, troubleshoot, review, and expand over time.

More importantly, it reflects how larger Terraform projects are typically organized, making the codebase easier for other engineers to understand and contribute to.

Building the Infrastructure with Terraform

With the architecture finalized, I began translating the entire environment into Terraform configuration files.

Rather than provisioning resources through the AWS Management Console, every component of the infrastructure — from networking to compute — was described declaratively in code. This meant Terraform became the single source of truth for the environment, allowing the entire lab to be recreated consistently whenever needed.

The deployment followed a layered approach, where each resource depended on the successful creation of the previous one.

Terraform Resource Flow

Provider ↓ VPC ↓ Subnets ↓ Internet Gateway ↓ Route Tables ↓ Security Groups ↓ EC2 Instances ↓ Outputs

This dependency chain is one of Terraform's biggest strengths. Instead of manually deciding the deployment order, Terraform automatically builds a dependency graph and provisions resources in the correct sequence.

Deploying the Infrastructure

Once the configuration was complete, deploying the environment required only a few Terraform commands.

terraform init terraform validate terraform plan terraform apply

Each command served a specific purpose:

terraform init initialized the working directory and downloaded the AWS provider.
terraform validate verified the syntax of the configuration files.
terraform plan generated an execution plan showing the changes Terraform intended to make.
terraform apply provisioned the infrastructure in AWS.

Successful Terraform Apply

One of the concepts I appreciated most while learning Terraform was the execution plan. Before creating or modifying resources, Terraform clearly displays every planned action, allowing infrastructure changes to be reviewed before they are applied.

Verifying the Deployment

After the deployment completed successfully, Terraform automatically displayed the outputs that I had configured.

These included:

_- Bastion Host Public IP

Private EC2 Private IP_

Rather than manually searching the AWS Console, these outputs provided the information needed to immediately begin testing connectivity.

The deployment was then verified by:

Connecting to the Bastion Host via SSH.

Using SSH Agent Forwarding.
Accessing the private EC2 instance without exposing it directly to the Internet.

Successful SSH Connection

Local Machine → Bastion Host → Private EC2

Successfully connecting to the private instance confirmed that the networking configuration, routing, security groups, and SSH authentication were all functioning as expected.

Engineering Challenges

Like most real-world projects, the deployment wasn't successful on the first attempt.

Throughout the process, I encountered several issues that required investigation and troubleshooting, including:

AWS authentication and credential configuration

Terraform state synchronization
EC2 replacement caused by ForceNew attributes
Public IP assignment behavior
SSH key configuration
Bastion Host connectivity
SSH Agent Forwarding setup

Each problem provided a better understanding of how Terraform interacts with AWS and reinforced the importance of reviewing execution plans before applying infrastructure changes.

Rather than treating these issues as setbacks, they became valuable learning opportunities that improved both my Terraform knowledge and my understanding of AWS infrastructure.

Troubleshooting and Lessons Learned

One of the biggest takeaways from this project wasn't learning Terraform syntax; it was learning how Terraform behaves when infrastructure doesn't match expectations.

Like most real-world deployments, my environment wasn't built successfully on the first attempt. Along the way, I encountered several issues that forced me to slow down, investigate the problem, and understand what Terraform was actually doing behind the scenes.

Those troubleshooting sessions ended up teaching me more than simply writing the configuration.

Terraform Plan is Your Best Friend

One of the first habits I developed was never running terraform apply without first reviewing the execution plan.

Terraform clearly displays every action it intends to perform before making any changes.

Whether it plans to create, modify, or destroy resources, the execution plan provides an opportunity to verify that the infrastructure matches your expectations.

On several occasions, reviewing the plan prevented unintended infrastructure changes.

Terraform Plan Output

Understanding Infrastructure Drift

While working on the lab, I also encountered infrastructure drift.

Terraform detected that parts of the infrastructure no longer matched the current state stored in the state file.

Initially, this was confusing.

After investigating further, I learned that Terraform continuously compares three things:

The infrastructure defined in code.
The Terraform state file.
The actual infrastructure running in AWS.

Whenever those three no longer match, Terraform reports the differences before making changes.

Understanding this concept completely changed the way I looked at Infrastructure as Code.

When Terraform Wanted to Replace My EC2

One of the most interesting issues occurred when Terraform unexpectedly wanted to destroy and recreate my Bastion Host.

At first, I assumed something was wrong with the configuration.

After carefully reviewing the execution plan, I discovered that changing certain EC2 attributes, such as public IP assignment, requires Terraform to replace the entire instance rather than modifying it in place.

This introduced me to the concept of ForceNew attributes.

Instead of immediately applying the changes, I investigated why Terraform wanted to recreate the resource.

That experience reinforced one of the most important Terraform lessons:

NOTE: Always understand the execution plan before applying infrastructure changes.

State Refresh

Another concept that became much clearer during this project was Terraform's state refresh process.

Before generating an execution plan, Terraform queries AWS to determine the current state of the infrastructure.

It then compares that information against the local state file and the Terraform configuration.

This refresh process helps detect infrastructure drift and ensures that future changes are based on the environment's current state.

Understanding this workflow made Terraform feel much less like a "black box" and much more like an engineering tool.

Debugging Instead of Guessing

One lesson I tried to follow throughout the project was to investigate problems instead of immediately searching for quick fixes.

Whenever Terraform reported an unexpected change, I tried to understand:

What changed?
Why did Terraform detect it?
Is the infrastructure actually different?
Is the state file outdated?
Is the planned change expected?

That mindset not only helped solve individual issues but also improved my understanding of how Terraform interacts with AWS resources.

Key Takeaways

By the end of this project, I realized that Terraform is much more than an automation tool.

It provides a predictable and repeatable way to manage infrastructure while also encouraging engineers to carefully review, validate, and understand every change before it reaches production.

For me, the troubleshooting process became just as valuable as the final deployment.

Final Thoughts

Rebuilding my Cloud Security Lab using Terraform changed the way I think about cloud infrastructure.

Before this project, I viewed Terraform primarily as an automation tool. By the end of the lab, I realized that Infrastructure as Code is really about consistency, repeatability, and engineering discipline. Instead of manually recreating environments, I can now deploy the same infrastructure confidently while knowing exactly how every component is configured.

More importantly, this project reinforced several cloud security principles that extend beyond Terraform itself:

Design networks with security in mind.
Minimize the attack surface using private subnets.
Apply the principle of least privilege through Security Groups.
Review infrastructure changes before applying them.
Treat infrastructure as code that can be version controlled, reviewed, and improved over time.

Although this article focused on building the infrastructure, the infrastructure itself is not the end goal.

It is the foundation.

The environment created in this project will now become the platform for the next phase of my Cloud Security Lab Series, where the focus shifts from building cloud infrastructure to detecting, investigating, and responding to attacks.

Future articles will cover topics including:

Windows Active Directory
Windows Event Logging
Sysmon
Splunk
Attack Simulation
Detection Engineering
Incident Investigation
MITRE ATT&CK Mapping

My objective is no longer just to build cloud infrastructure.

It is to build an end-to-end security lab that demonstrates the complete lifecycle of modern cybersecurity — from infrastructure deployment to attack simulation, monitoring, detection, and response.

Thank you for reading, and I hope this walkthrough provides useful insights for anyone interested in Infrastructure as Code, cloud security, or building practical cybersecurity labs.

If you have suggestions or feedback, feel free to connect with me on LinkedIn or GitHub. I'm always open to learning from the community and discussing new ideas.

See you in the next Cloud Security Lab article.

Repository: https://github.com/AaradhyaDesai/

LinkedIn: https://linkedin.com/in/aaradhya-desai-77236519b