Project Overview
Hello! it has been a while since I last posted , I have been working on and learning how to architect production grade CI/CD pipeline in a two part series I call "The AWS Defense in Depth Architecture: From Pipeline to Runtime". As cloud architectures grow in complexity, the traditional lines between engineering, operations, and security have vanished. You cannot effectively design an enterprise AWS environment without understanding exactly how an adversary might exploit it.
In this two-part series, we are bridging the gap between offensive security tactics and defensive cloud architecture. We aren't just going to look at diagrams; we are going to build a workload, deliberately break into it to understand the blast radius, and then engineer robust solutions to keep attackers out.
Part 1 (This Article): Focuses on the Shift-Left paradigm. We will build an EKS pipeline, simulate a container breakout, and harden the architecture using IaC scanning, CI/CD gates, and strict IAM least privilege.
Part 2: Focuses on the Assume Breach mindset. We will architect a zero-cost, event-driven runtime defense engine using Falco, Amazon EventBridge, and AWS Lambda to detect and automatically remediate active threats inside our cluster.
In the modern era a Solutions Architect's role extends far beyond simply connecting cloud services; they are the gatekeepers of an organization's blast radius. Modern cloud architecture demands that security is not an afterthought handled by a separate silo, but a foundational element baked into the Infrastructure as Code (IaC) and deployment pipelines.
I have not yet secured a role but I once worked in an incubator startup right here in Nairobi for less than a month , I frequently saw the same anti-patterns:
The "Just Works" Anti-Pattern: Rushing to production prioritizing functionality over security, resulting in overly permissive roles and unpatched containers.
Node-Level Over-Permissioning: Making the beginner mistake of attaching massive blast-radius policies (like AmazonS3FullAccess) to the node's environment rather than restricting permissions at the application pod level.
Bloated Container Images: Using generic, heavy Docker images that contain hundreds of unnecessary OS packages, drastically increasing the attack surface.
Lack of Automated Gates: Relying on basic "Build -> Push -> Deploy" pipelines without automated checks, which requires manual review overhead and allows misconfigurations to incur operational risk or downtime.
In this article, we will build a realistic, production-grade AWS Elastic Kubernetes Service (EKS) workload, deliberately leave it vulnerable, act as the red team to exploit it, and finally, harden it using industry-standard DevSecOps practices.
Blast radius this is a term used in software development to refer to the potential impact of a failure within a system caused by a security breach, software bug or a misconfiguration.
**Disclaimer:* The vulnerable configurations presented here are for educational lab environments only. Never deploy them to production.*
Architecture Overview
In enterprise environments, a single overly permissive IAM role or a vulnerable base image can lead to a catastrophic breach. When reviewing pipelines and architectures, Solutions Architects should use the following framework:
Platform and Orchestration Trade-offs: For this workload, we choose Amazon EKS over ECS. While ECS is simpler and tightly integrated, EKS is the enterprise standard and enables powerful native controls like OIDC-backed IAM Roles for Service Accounts (IRSA).
IaC and Pipeline Tooling: Standardize on tools like Terraform for reliable state management and GitHub Actions to enforce automated security gates while minimizing external dependencies.
Shift-Left Automation: Continuous scanning must be placed at the Pull Request (PR) stage. You must physically prevent developers from merging misconfigured code by enforcing failure states (e.g., exit-code: '1') using tools like TruffleHog, tfsec, and Trivy.
Defense in Depth: Verify that security controls are layered across the IaC layer, the pipeline layer, the container layer, and the AWS API layer.
Prerequisites & Lab Setup
To follow along in your own AWS account, ensure you have:
- An active AWS Account (Note: EKS incurs a ~$73/month control plane fee. Remember to destroy resources when finished).
- AWS CLI installed and configured with Administrator access.
- Terraform (v1.5+),
kubectl, and Docker installed locally.
Let's get your local workspace ready. Open your terminal and run:
# Step 1: Create your project workspace
mkdir my-eks-devsecops-lab
cd my-eks-devsecops-lab
# Step 2: Create the required folder structure
mkdir infrastructure src k8s .github
mkdir -p .github/workflows
# Step 3: Verify AWS authentication
aws sts get-caller-identity
Phase 1: Building the Vulnerable Baseline (Common Anti-Patterns)
We begin with a scenario I see frequently: code that "just works." We will deploy an EKS cluster and a Python app that reads data from S3. However, we are making a classic beginner mistake: giving the underlying EC2 node full access to S3, rather than restricting permissions to the specific application pod.
1. Provision the Insecure Infrastructure.
Create a file inside the infrastructure folder named main.tf. This sets up a basic EKS cluster but attaches a massive blast-radius policy (AmazonS3FullAccess) to the node group.
terraform {
required_version = ">= 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
# LOCK PRODUCER: Pin to an older v5 release before they removed legacy EKS properties
version = "5.31.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
# Vulnerable IAM Role for EKS Nodes
resource "aws_iam_role" "insecure_node_role" {
name = "insecure-eks-node-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = { Service = "ec2.amazonaws.com" }
}]
})
}
# The Anti-Pattern: Giving the whole node full S3 access
resource "aws_iam_role_policy_attachment" "s3_full_access" {
role = aws_iam_role.insecure_node_role.name
policy_arn = "arn:aws:iam::aws:policy/AmazonS3FullAccess"
}
# EKS Cluster (Kept at v19 natively matching your workspace cache)
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "19.21.0"
cluster_name = "devsecops-lab-cluster"
cluster_version = "1.30"
vpc_id = "vpc-12345678" # Replace with your Default VPC ID
subnet_ids = ["subnet-123", "subnet-456"] # Replace with your subnets
# --- ADD THESE TWO LINES TO ENABLE PUBLIC ENDPOINT ACCESS ---
cluster_endpoint_public_access = true
cluster_endpoint_private_access = true
# -----------------------------------------------------------
eks_managed_node_groups = {
vulnerable_nodes = {
min_size = 1
max_size = 2
desired_size = 1
iam_role_arn = aws_iam_role.insecure_node_role.arn
}
}
}
This Terraform configuration provisions the networking and compute resources for our Kubernetes cluster. The critical security flaw here is the
aws_iam_role_policy_attachment. By attachingAmazonS3FullAccessto theinsecure_node_role, we are granting every pod running on this EC2 instance full administrative control over all S3 buckets in the AWS account.
Deploy the infrastructure (this takes ~15 minutes):
cd infrastructure
terraform init
terraform apply -auto-approve
cd ..
2. Build the Vulnerable Application
Next, we write a simple script that grabs data from S3, assuming permissions are handled by the node's environment. Create src/app.py:
# src/app.py
import boto3
from flask import Flask
app = Flask(__name__)
@app.route('/')
def read_s3():
# ANTI-PATTERN: Relies entirely on the node's underlying IAM role
s3 = boto3.client('s3', region_name='us-east-1')
try:
response = s3.list_buckets()
buckets = [bucket['Name'] for bucket in response['Buckets']]
return f"I can see all these buckets: {', '.join(buckets)}"
except Exception as e:
return str(e)
if __name__ == '__main__':
# ANTI-PATTERN: Running as root on all interfaces
app.run(host='0.0.0.0', port=8080)
Notice that the
boto3.clientcall does not pass any explicit AWS credentials. It blindly trusts the overarching environment (the EC2 Node) to provide access. Furthermore, the app binds to0.0.0.0, exposing it to the entire network
Create src/requirements.txt:
flask==2.2.2
boto3==1.26.0
werkzeug==2.2.2
Create the insecure Dockerfile in the src folder:
# src/Dockerfile
# ANTI-PATTERN: Bloated image, running as root
FROM python:3.9
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["python", "app.py"]
3. Deploy to EKS.
Build, tag, and push the image to AWS ECR, and deploy it to Kubernetes.
Build the Docker image
bash
# 1. Move up one level into your project root directory
cd secure-eks-pipeline
# 2. Run your docker build command again
docker build -t insecure-app:latest ./src
Refresh the local terminal access token to bind securely to the cluster's public DNS address in short we are connecting to EKS :
aws eks update-kubeconfig --region us-east-1 --name devsecops-lab-cluster
Instantiate a private cloud registry using Amazon ECR, to authenticate the local Docker daemon to the AWS API plane, and push the securely tagged image directly to AWS:
# Create the private cloud registry repository
aws ecr create-repository --repository-name insecure-app --region us-east-1
# Authenticate local Docker to the remote AWS ECR endpoint
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $(aws sts get-caller-identity --query "Account" --output text).dkr.ecr.us-east-1.amazonaws.com
# Tag and ship the local binary layers to the cloud
export ECR_URI="$(aws sts get-caller-identity --query "Account" --output text).dkr.ecr.us-east-1.amazonaws.com/insecure-app"
docker tag insecure-app:latest $ECR_URI:latest
docker push $ECR_URI:latest
Open your blank k8s/deployment.yaml file in VS Code and paste this deployment configuration to spin up your vulnerable Python app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: insecure-app-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: insecure-app
template:
metadata:
labels:
app: insecure-app
spec:
containers:
- name: web-app
# ENTER YOUR EXACT COPIED STRING HERE WITHOUT ALTERATIONS:
image: <YOUR_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/insecure-app:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
Apply this manifest using kubectl:
kubectl apply -f k8s/deployment.yaml
Architectural Insight:
Thepython:3.9image contains hundreds of unnecessary OS packages, increasing the attack surface. Furthermore, by not specifying aUSER, the container runs as root.
Phase 2: Threat Modeling & The Red Team Practices.
As a Solution Architect, part of threat modeling is understanding how an attacker navigates your environment. Let's simulate a Server-Side Request Forgery (SSRF) or Remote Code Execution (RCE) vulnerability that gives an attacker a shell inside our pod.
1. Simulate the Breach.
Open the standard terminal on your laptop (the same one you used to run terraform apply or docker build earlier). Ensure you are connected to your AWS environment.
Type this to see a list of your running applications:
kubectl get pods
You are still in your local terminal. You are now going to tell Kubernetes to open a live connection from your laptop directly into the running container inside AWS. Type this (replacing the placeholder with the name you just copied.
kubectl exec -it deployment/insecure-app-deployment -- /bin/bash
The terminal prompt will instantly change. It will go from looking like your local computer (e.g., chimera1@fedora:~$) to looking like this:
You are no longer typing commands on your laptop. You are now physically "inside" the compromised container running in the AWS cloud. Every command you type now executes as if you were sitting at the keyboard of that specific pod.
2. Check Privileges
root@pod-1234:/app# whoami
root
Impact: Because we are root, we can install new packages (like curl or nmap) to map the internal network.
3. Abuse the Metadata Service (IMDSv1)
Because we didn't block access to the EC2 metadata service or enforce IMDSv2, we can steal the Node's IAM credentials directly from inside the container.The IP address 169.254.169.254 is a special AWS address that only works when you are inside an AWS environment.
During infrastructure verification, we noted that the cluster nodes strictly enforce IMDSv2 (Instance Metadata Service Version 2). Traditional, unauthenticated IMDSv1 GETrequests to the link-local address (http://169.254.169.254) are dropped by default, preventing basic extraction.
To bypass this securely, we had to mimic legitimate node behaviors by programmatically requesting a temporary cryptographic session token via a PUT request:
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
Attempting to pull credentials from a static role name returned a 404 Not Found because the AWS EKS module automatically generates randomized names for node groups.
To overcome this, we queried the base security directory while passing our valid IMDSv2 session token in the request header to discover the exact, real-world identity assigned to the EC2 host:
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/iam/security-credentials/
By appending the dynamically discovered role name to our metadata request string, the endpoint immediately dumped the underlying security credentials payload:
root@pod-1234:/app# curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
http://169.254.169.254/latest/meta-data/iam/security-credentials/vulnerable_nodes-eks-node-group-20260615172216169600000004
4. The Blast Radius Expansion
The output reveals the AccessKeyId, SecretAccessKey, and Token. An attacker can configure these on their local machine. Because the node role has AmazonS3FullAccess, the attacker bypasses the Kubernetes boundary entirely and can download or delete every S3 bucket in your AWS account.
To leave the compromised container and return to your local laptop, simply type:
exit
Phase 3: The Hardening (Secure Design Patterns)
We will now implement the "Shift-Left" paradigm. We will harden the container, implement IAM least privilege using IRSA, and build a CI/CD pipeline that blocks vulnerable code.
1. Secure the Dockerfile
Open src/Dockerfile and replace it with this hardened, minimal version:
# src/Dockerfile (HARDENED)
# SECURE: Minimal base image reduces attack surface
FROM python:3.9-slim
# SECURE: Create a non-root user and group
RUN groupadd -r appgroup && useradd -r -g appgroup appuser
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# SECURE: Hand over ownership and switch to non-root user
RUN chown -R appuser:appgroup /app
USER appuser
EXPOSE 8080
CMD ["python", "app.py"]
Using
python:3.9-slimdrastically cuts down the number of pre-installed OS utilities, reducing the attack surface. By explicitly creatingappuserand ending the file withUSER appuser, we physically prevent the container from executing as root. If an attacker gains a shell now, they will receive "Permission Denied" errors when trying to install network mapping tools.
2. Implement IAM Roles for Service Accounts (IRSA)
Instead of giving the EC2 node permissions, we give permissions only to the specific Kubernetes ServiceAccount used by our app via OIDC. Add this to your infrastructure/main.tf:
# infrastructure/main.tf (Additions)
# SECURE: Create a strict, least-privilege policy
resource "aws_iam_policy" "app_s3_policy" {
name = "StrictAppS3Policy"
description = "Allows reading only from the specific app bucket"
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = ["s3:GetObject"]
Resource = ["arn:aws:s3:::my-specific-app-bucket/*"] # Restricted to ONE bucket
}]
})
}
# SECURE: Map the IAM Role to the Kubernetes Service Account via OIDC
module "iam_eks_role" {
source = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
version = "~> 5.0"
role_name = "SecureAppRole"
oidc_providers = {
main = {
provider_arn = module.eks.oidc_provider_arn
namespace_service_accounts = ["default:secure-app-sa"]
}
}
role_policy_arns = {
policy = aws_iam_policy.app_s3_policy.arn
}
}
This block decouples application permissions from infrastructure permissions. It creates an IAM policy scoped down to a single S3 bucket, then uses the AWS OIDC (Open ID Connect ) provider to map that policy strictly to the
secure-app-saKubernetes ServiceAccount. The underlying EKS node remains entirely unprivileged.
You also need to create the Kubernetes ServiceAccount and annotate it with the IAM role ARN. Create k8s/serviceaccount.yaml:\
# k8s/serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: secure-app-sa
namespace: default
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::<YOUR_ACCOUNT_ID>:role/SecureAppRole
Replace
<YOUR_ACCOUNT_ID>with your actual AWS account ID. You can also use Terraform outputs to dynamically inject this value.
Then update your deployment to use this ServiceAccount:
# k8s/secure-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app-deployment
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
serviceAccountName: secure-app-sa # <-- CRITICAL: Use the IRSA ServiceAccount
containers:
- name: web-app
image: <YOUR_ACCOUNT_ID>.dkr.ecr.us-east-1.amazonaws.com/secure-app:latest
imagePullPolicy: Always
ports:
- containerPort: 8080
3. Build the Automated Security Gates
We cannot trust developers to always remember these rules; we must enforce them programmatically. Create .github/workflows/main.yml:
# .github/workflows/main.yml
name: DevSecOps Pipeline
on:
push:
branches: ["main"]
pull_request:
branches: ["main"]
jobs:
security-scanning:
runs-on: ubuntu-latest
permissions:
contents: read # Required for checkout
security-events: write # Required for SARIF upload
steps:
- name: Checkout code
uses: actions/checkout@v4
# GATE 1: Secret Scanning
- name: TruffleHog Secret Scan
uses: trufflesecurity/trufflehog@main
with:
path: ./
base: ${{ github.event.repository.default_branch }}
head: HEAD
extra_args: --debug --only-verified
# GATE 2: Scan Terraform for misconfigurations before deployment
- name: Checkov IaC Scan
uses: bridgecrewio/checkov-action@master
with:
directory: ./infrastructure
framework: terraform
output_format: cli
soft_fail: false
# Fails the build on CRITICAL/HIGH misconfigurations
# GATE 3: Build the Docker image
- name: Build Image
run: docker build -t my-app:latest ./src
# GATE 4: Scan the Docker image for Vulnerabilities
- name: Trivy Vulnerability Scanner
uses: aquasecurity/trivy-action@master
with:
image-ref: 'my-app:latest'
format: 'table'
exit-code: '1' # Fails the build on HIGH/CRITICAL CVEs
ignore-unfixed: true
severity: 'CRITICAL,HIGH'
vuln-type: 'os,library'
*Architectural Insight: *
By placing tools like Checkov and Trivy in the CI/CD pipeline, we enforce "Shift-Left" automation. If these tools detect a misconfiguration (like an overly permissive IAM role) or a vulnerable package, they trigger an exit-code: '1', physically preventing developers from merging vulnerable code into the main branch.
Phase 4: Validating the Security Gates (Planned Pipeline Failure)
To prove that our Shift-Left automation handles threats before they reach production, we pushed our code to GitHub to trigger the DevSecOps workflow. The pipeline initiated, cleared the TruffleHog secret check, logged infrastructure misconfigurations via tfsec using a warning threshold, and built our container image.
However, upon reaching GATE 3 (Trivy Vulnerability Scanner), the pipeline intentionally failed with an exit code 1.
Looking at the logs, Trivy intercepted 25 OS vulnerabilities (including Critical heap buffer overflows in openssl) and 10 Python application-layer vulnerabilities (such as a session cookie disclosure flaw in Flask 2.2.2). Because our workflow enforces a strict security ceiling on CRITICAL,HIGH anomalies, the pipeline automatically broke, preventing an unhardened, insecure workload from ever spinning up inside our AWS production environment. This is exactly how automated preventative controls protect enterprise cloud platforms.
Real CVE Context: Flask 2.2.2 and Werkzeug 2.2.2 are known to have multiple vulnerabilities including CVE-2023–30861 (session cookie disclosure), CVE-2023–23934 (cookie parsing issue), CVE-2023–25577 (DoS via multipart parsing), and CVE-2024–34069 (RCE via debugger). Upgrading to Flask 2.3.x+ and Werkzeug 3.0.3+ resolves these.
Validation, Monitoring & Teardown
In Phase 2 we successfully stole the EC2 node's credentials using curl. We fix this at the infrastructure level using Terraform. By enforcing IMDSv2 and setting the "hop limit" to 1, we ensure that a pod running on the node cannot reach the metadata service. Security is an ongoing operational task. To ensure long-term visibility:
Enforce IMDSv2 Block IMDSv1:
Enforce IMDSv2 via launch templates. Require session tokens and set a HttpPutResponseHopLimit of 1 to prevent pods from accessing node metadata.
1. Open your code editor: Open the infrastructure/main.tf file you created in Phase 1.
2. Modify the Node Group: Scroll down to your eks_managed_node_groups block. You are going to add a metadata_options section to enforce IMDSv2. Update the block to look exactly like this:
eks_managed_node_groups = {
vulnerable_nodes = { # Or secure_nodes if you renamed it
min_size = 1
max_size = 2
desired_size = 1
iam_role_arn = aws_iam_role.insecure_node_role.arn
# ADD THIS BLOCK: Enforce IMDSv2 and block pod access
metadata_options = {
http_endpoint = "enabled"
http_tokens = "required" # This forces IMDSv2
http_put_response_hop_limit = 1 # Prevents pods from reading it
}
}
}
While requiring
http_tokensenforces IMDSv2, thehttp_put_response_hop_limit = 1is the ultimate safeguard. This setting ensures that metadata network packets can only travel 1 network "hop." Because Kubernetes pods run on a secondary virtual network interface inside the EC2 node, accessing the metadata service requires at least 2 hops. Setting the limit to 1 effectively kills the packet before it can reach the pod, completely neutralizing SSRF credential theft.
3. Apply the changes in your local terminal: Open your terminal on your laptop, ensure you are in the infrastructure folder, and apply the secure configuration to AWS:
cd infrastructure
terraform apply -auto-approve
Architectural Insight :
If you try to run that same maliciouscurlcommand inside the container now, it will return a401 Unauthorizedor timeout error. The vulnerability is successfully patched.
cd ..
# Verify you are in the right place (it should show 'src' and 'infrastructure')
ls
# Build the hardened Docker image
docker build -t secure-app:latest ./src
# Update kubeconfig
aws eks update-kubeconfig --region us-east-1 --name devsecops-lab-cluster
# Authenticate to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin $(aws sts get-caller-identity --query "Account" --output text).dkr.ecr.us-east-1.amazonaws.com
# Tag and push the hardened image
export ECR_URI="$(aws sts get-caller-identity --query "Account" --output text).dkr.ecr.us-east-1.amazonaws.com/secure-app"
docker tag secure-app:latest $ECR_URI:latest
aws ecr create-repository \
--repository-name secure-app \
--region us-east-1 \
--image-scanning-configuration scanOnPush=true
docker push $ECR_URI:latest
# Apply the ServiceAccount and secure deployment
cd k8s
kubectl apply -f serviceaccount.yaml
kubectl apply -f secure-deployment.yaml
# Verify pods are running
kubectl get pods
# Test the hardening: try to exec into the pod and access metadata
kubectl exec -it deployment/secure-app-deployment -- /bin/bash
# Inside the container, try the old attack:
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl http://169.254.169.254/latest/meta-data/iam/security-credentials/insecure-eks-node-role
# You expect to receive nothing in return - it shows the architecture has been hardened.
exit
You expect to receive nothing in return it shows it has been hardened.
Deploying Runtime Security:
Deploy tools like Falco or AWS GuardDuty for EKS to monitor for anomalous behavior.Though Part 2 is dedicated to building out the Falco architecture. However, if you wanted to see the literal commands of how a Solutions Architect deploys it into this cluster right now, it is done using Helm (a package manager for Kubernetes) from your local laptop terminal.
# Add the Falco Helm repository
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
# Install Falco with eBPF probe (modern, efficient kernel tracing)
helm install falco falcosecurity/falco --namespace falco --create-namespace --set driver.kind=ebpf --set tty=true
Crucial Step: Lab Teardown
As a responsible cloud practitioner, always clean up your environment to avoid unexpected AWS billing. Open your terminal:
cd infrastructure
terraform destroy -auto-approve
Key Architectural Lessons
Mastering these concepts separates you from the pack. By walking through this architecture, you demonstrate:
Blast Radius Reduction: Decoupling application permissions from infrastructure permissions.
Shift-Left Automation: Integrating SCA, IaC scanning, and vulnerability checks into CI/CD to reduce manual review overhead.
Defense in Depth: Applying security at the IaC layer, pipeline layer, container layer, and AWS API layer.
Metadata Service Protection: Using IMDSv2 with hop limit 1 to prevent container credential theft.
Minimal Container Images: Reducing attack surface by using slim base images and non-root users.
Conclusion & Extensions
A Secure-by-Default architecture requires acknowledging that developers will make mistakes, and building systems that catch those mistakes before they become breaches. To extend this architecture, consider implementing AWS Organizations with Service Control Policies (SCPs) to establish hard boundaries across multiple accounts, or implementing Cosign to digitally sign your container images before pushing them to Amazon ECR.
Bonus: DevSecOps Decision Matrix
Keep this matrix handy for your next architecture review or certification exam.
| Security Domain | Vulnerable Configuration (Anti-Pattern) | Secure Configuration (Best Practice) | SA Justification & Impact |
|---|---|---|---|
| IAM Strategy | EKS Node Group roles with broad permissions (AmazonS3FullAccess). |
IAM Roles for Service Accounts (IRSA) with OIDC. | Limits blast radius. A compromised pod only yields permissions specific to its app. |
| Container Build | Running as root, using bloated base images (e.g., python:3.9). |
USER nonroot, minimal/distroless base images. |
Reduces attack surface. Attackers cannot easily install external tools if a breakout occurs. |
| Metadata Security | IMDSv1 enabled, accessible from pods. | Require IMDSv2, Hop Limit = 1. | Prevents Server-Side Request Forgery (SSRF) attacks from extracting EC2 credentials. |
| Pipeline Security | Build -> Push -> Deploy (No automated checks). | TruffleHog -> tfsec -> Trivy -> Deploy. | Catches vulnerabilities at the PR stage before they incur operational risk or downtime. |






















