Now that we have our foundation solid, it's time to deploy the actual Kubernetes cluster. In this part, we'll be provisioning the Amazon EKS control plane and our first set of Managed Node Groups. If the VPC we built in Part 1 is the "land", the EKS Cluster is the "factory" we are building on top of it.

Why EKS Managed Node Groups are the Gold Standard:

In the early days of Kubernetes, you had to manage your own EC2 instances, handle patching, and manually join them to the cluster. It was a headache. Today, Managed Node Group do the heavy lifting for us:

  1. No more "Snowflake" Nodes: AWS ensures every node is provisioned from a hardened, EKS-optimized AMI.
  2. Graceful Updates: When it's time to upgrade your Kubernetes version, AWS handles the "Rolling Update" login — draining your pods and replacing nodes one by one so your users never experience a blip.
  3. Infrastructure as Code friendly: They integrate perfectly with Terraform, allowing us to treat our compute power as a truly disposable, scalable resource.
  4. Security by Default: EKS manages the automated rotation of certificates for your worker nodes, ensuring that communication with the control plane remains encrypted and valid without manual intervention.

The Cost of Doing Business

Transparency is key when building for clients. An EKS cluster has a flat rate of $0.10 per hour (~$72/month). Beyond that, your costs are driven by the EC2 instances you choose and the data flowing through your NAT Gateways. By using Terraform to manage this, we can easily spin down environments that aren't in use, saving significant budget.

The "Senior Architect" Checklist: Designing for Production

Before we touch the code, we need to talk about what seprates a "lab" cluster from a "Production" cluster. Here is what I keep in account for every client deployment:

  • High Availability is Non-Negotiable: We don't just deploy to one spot. Our cluster is spread across mutiple Availability Zones. If an entire AWS data center goes dark, your app stays up.
  • The "Least Privilege" Principle: We use IAM Roles for Service Accounts (IRSA). Your pods shouldn't have full admin access to your AWS account, they should only have exactly what they need to do their job.
  • Endpoint Security: While you can restrict the EKS API to stay entirely within your private network, many production teams prefer a 'Public Endpoint with CIDR Whitelisting'. This allows your team to access the cluster via a VPN while ensuring the rest of the world is completely blocked at the network level.
  • Modern Identity: We are moving exclusively to EKS Access Entries (API Mode). No more fragile aws-auth ConfigMaps—just pure, clean AWS API-based access.
  • Native Backups: In 2026, we don't leave recovery to chance. We enable the EKS Backup service to protect cluster metadata and persistent volumes natively.
  • Upgrade Paths: Kubernetes moves fast. We design our Terraform modules to be "version-aware", making it simple to bump from 1.30 to 1.31 without rebuilding the entire world.

Now, let's examine the Terraform configuration for creating our EKS cluster.

Deep Dive: Breaking Down the EKS Configuration

With our networking ready, we move to the eks.tf file. This is the brain of our operation. I have structured this using the industry-standard terraform-aws-modules/eks/aws module (v20+), which natively support EKS Access Entries.

1. The Pre-requisites & Provider Setup

Before we build, we need to know "who" is building. We use the aws_caller_identity to grab our current account details. This is essential for ensuring that the person running the Terraform script is automatically granted administrative access to the cluster via Access Entries.

data "aws_caller_identity" "current" {}

# These are vital for Part 3's Helm/K8s provider auth
data "aws_eks_cluster" "current" {
  name       = module.eks.cluster_name
  depends_on = [module.eks]
}

data "aws_eks_cluster_auth" "current" {
  name       = module.eks.cluster_name
  depends_on = [module.eks]
}

provider "aws" {
  region = var.region
  
  # Global tagging: Every resource created will carry these tags.
  # This is a lifesaver for cost-tracking and organizational audits.
  default_tags {
    tags = {
      Environment = "${var.prefix}-${var.environment}"
      ManagedBy   = "Terraform"
      Project     = "EKS-Masterclass"
    }
  }
}

2. Defining the EKS Cluster Control Plane

This is where the magic happens. We're using a modern version of the EKS module (~> 20.8). Note a few production-grade decisions here:

  • Access Entries: We've moved away from the fragile aws-auth ConfigMap. Access is now managed via the AWS API, making it much harder to accidentally lock yourself out of your cluster.
  • Observability: We've enabled all log types (API, Audit, Authenticator). In a production environment, if something goes wrong, you need these logs in CloudWatch to find the "smoking gun."
  • Several cluster add-ons are configured, including VPC CNI, CoreDNS, and the EBS CSI driver.
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.8.5"

  cluster_name    = local.name
  cluster_version = "1.30" # Staying current with stable versions

  vpc_id     = local.id_of_vpc
  subnet_ids = module.vpc.private_subnets

  # Network Security: Public for ease of access (with IP whitelisting), 
  # Private for secure node-to-control-plane communication.
  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true
  cluster_endpoint_public_access_cidrs = var.allowed_mgmt_ips

  # KMS Secret Encryption
  create_kms_key = true
  cluster_encryption_config = {
    resources = ["secrets"]
  }

  # Modern Access Management
  authentication_mode = "API" 

  # Managed Add-ons (Pinned Versions for Stability)
  cluster_addons = {
    kube-proxy = { addon_version = "v1.30.0-eksbuild.3" }
    coredns = { addon_version  = "v1.11.1-eksbuild.9" }
    vpc-cni = {
      addon_version            = "v1.18.1-eksbuild.1"
      service_account_role_arn = module.vpc_cni_irsa_role.iam_role_arn
    }
    aws-ebs-csi-driver = {
      addon_version            = "v1.30.0-eksbuild.1"
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
  }
  
  # Enable Control Plane Logging
  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
}

3. Managed Node Groups: The Heavy Lifters

Our compute strategy is simple: High availability + Right-sizing. Instead of one giant pool of instances, we define managed node groups that AWS will automatically patch and update for us. By using variables for instance_types and capacity_type (On-Demand vs. Spot), we keep the configuration flexible for different client budgets.

eks_managed_node_groups = {
  main = {
    name           = "app-node-group"
    instance_types = ["t3.medium"] # Balanced for general workloads
    capacity_type  = "ON_DEMAND"   # Reliability for core services
    
    min_size     = 1
    max_size     = 2
    desired_size = 1

    # Enable monitoring for granular node-level visibility
    enable_monitoring = true
  }
}

4. Security Architecture: The Zero-Trust Identity Layer (IRSA)

In a production environment, "Standard" permissions aren't enough. We follow the Principle of Least Privilege. Instead of giving our worker nodes broad access to AWS, we use IAM Roles for Service Accounts (IRSA).

Why IRSA is Non-Negotiable:

  • Blast Radius Limitation: If a specific pod (like the EBS driver) is compromised, the attacker only gets the permissions of that role, not the entire node.
  • Auditability: Every action taken by a pod is logged in CloudTrail under its specific IAM role, making compliance audits (SOC2/ISO27001) much smoother.

The Implementation: Dedicated IRSA for Managed Add-ons

We are configuring our core cluster components as Managed Add-ons. This means AWS handles the patching, but we still control the security via these Terraform modules:

################################################################################
# IRSA: Identity for the VPC CNI (Networking)
################################################################################
module "vpc_cni_irsa_role" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.30"

  role_name_prefix    = "${local.name}-vpc-cni-"
  attach_vpc_cni_policy = true
  vpc_cni_enable_ipv4   = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:aws-node"]
    }
  }
}

################################################################################
# IRSA: Identity for the EBS CSI Driver (Storage)
################################################################################
module "ebs_csi_driver_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "~> 5.30"

  role_name_prefix      = "${local.name}-ebs-csi-"
  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }
}

Envelope Encryption: Protecting Secrets with KMS

Compliance frameworks like SOC2 or HIPPA require that sensitive data (Kubernetes Secrets) be encrypted at rest with a key that you control, rather than a generic AWS-managed key.

We use the AWS KMS module to create a customer-managed key (CMK). This key is then used by the EKS control plane to perform "Envelope Encryption" on every secret object stored in etcd.

################################################################################
# KMS: Customer Managed Key for Cluster Encryption
################################################################################

# KMS Secret Encryption
  create_kms_key = true
  cluster_encryption_config = {
    resources = ["secrets"]
  }

Resilience: Native EKS Backup

New in our architecture is the EKS Backup configuration. This provides a managed way to back up your cluster's state and data without managing third-party plugins. It's a "set and forget" safety net for your production workloads.

Execution: Bringing the Cluster to Life

When you're ready to deploy, the workflow remains the same, but the stakes are higher. This process will take about 10–15 minutes as AWS provisions the redundant control plane across multiple zones.

  1. Initialize: terraform init (Syncs your providers and HCP Terraform state).
  2. Verify: terraform plan -var-file=vars.tfvars (Double-check your node sizes and regions).
  3. Deploy: terraform apply -var-file=vars.tfvars (Grab a coffee while the "brain" is built).

Summary & Moving Forward

We now have a fully functional, highly secure Kubernetes "factory." By following this phase, we've moved beyond a basic setup and implemented:

  • Modern Identity Management: Leveraging EKS Access Entries for secure, scalable access.
  • Granular Security: Using IRSA and KMS envelope encryption to protect data at rest.
  • Production Observability: Enabling full control plane logging for auditability and troubleshooting.

But a cluster is just an empty shell without the right ingress and networking controllers. In Part 3, we'll dive into the world of Helm, where we'll install the vital controllers that handle Load Balancing and traffic management.

A Final Thought: Building a cluster is relatively straightforward; securing it to an enterprise standard is where the real work begins. If you're navigating the complexities of EKS compliance, secret management, or migration, I'm here to help you get it right the first time. Let's build something stable together.