DOP-C02
AWS Certified DevOps Engineer - Professional
The AWS Certified DevOps Engineer - Professional (DOP-C02) validates technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. This certification targets experienced DevOps engineers with two or more years of experience and is one of the most advanced AWS certifications.
The exam covers six domains: SDLC Automation (22%), Configuration Management and IaC (17%), Resilient Cloud Solutions (15%), Monitoring and Logging (15%), Incident and Event Response (14%), and Security and Compliance (17%). Candidates must demonstrate deep knowledge of AWS CodePipeline, CodeBuild, CodeDeploy, CodeCommit, CloudFormation, CDK, OpsWorks, Systems Manager, CloudWatch, X-Ray, EventBridge, and container services (ECS, EKS, Fargate).
Key skills tested include implementing and managing CI/CD pipelines at scale, applying blue/green, canary, and rolling deployment strategies, implementing infrastructure as code, automating operational processes, designing and implementing logging and monitoring solutions, responding to incidents with automated remediation, and implementing security controls and compliance validation in automated pipelines. The DOP-C02 version was released in March 2023.
DOP-C02 Practice Exam 1
Comprehensive practice exam covering core AWS DevOps Engineer Professional topics including CI/CD pipeline automation, infrastructure as code, resilient cloud solutions, monitoring and logging, incident response, and security compliance across 75 professional-level questions.
DOP-C02 Practice Exam 2
Practice exam focusing on infrastructure automation and configuration management at scale, covering CI/CD pipelines, IaC patterns, resilience, monitoring, incident response, and security compliance.
DOP-C02 Practice Exam 3
Practice exam emphasizing resilience, disaster recovery, and high availability patterns including CI/CD pipelines, infrastructure as code, monitoring, incident response, and security compliance for DevOps professionals.
DOP-C02 Practice Exam 4
Practice exam centered on observability, monitoring, and logging architectures covering CI/CD pipelines, infrastructure as code, resilient solutions, incident response, and security compliance for AWS DevOps professionals.
DOP-C02 Practice Exam 5
Practice exam covering event-driven automation and incident management with CI/CD pipelines, infrastructure as code, resilience patterns, monitoring, and security compliance.
DOP-C02 Practice Exam 6
Practice exam covering security pipelines and compliance automation with CI/CD, infrastructure as code, resilience, monitoring, and incident response.
Desbloquear Todo o Conteúdo para DOP-C02
6 Simulado(s) + Flash Cards — acesso por 3 meses
ou incluído com assinatura Mensal / Pacote de Conteúdo
Pré-visualização (10 / 120)
Flash Cards
cartões cobrindo conceitos-chave de 120 DOP-C02
ou incluído com assinatura Mensal / Pacote de Conteúdo
110 mais cartões disponíveis após desbloquear
Idiomas Disponíveis
Tópicos do Exame
DOP-C02 Cheat Sheet
Guia de referência rápida - 6 seções
AWS Certified DevOps Engineer - Professional (DOP-C02)
The DOP-C02 exam validates your technical expertise in provisioning, operating, and managing distributed application systems on the AWS platform. This Professional-level certification is designed for individuals who perform a DevOps engineer role and have two or more years of experience with AWS. The exam tests your ability to implement and manage continuous delivery systems and methodologies, implement and automate security controls, governance processes, and compliance validation, define and deploy monitoring, metrics, and logging systems, and implement systems that are highly available, scalable, and self-healing on the AWS platform.
Exam Details
| Exam Code | DOP-C02 |
| Duration | 180 minutes |
| Number of Questions | 75 questions (65 scored + 10 unscored) |
| Passing Score | 750 / 1000 |
| Cost | $300 USD |
| Validity | 3 years |
| Question Types | Multiple choice (single & multiple select), scenario-based |
| Testing Options | Pearson VUE testing center or online proctored |
| Recommended Experience | 2+ years provisioning, operating, and managing AWS environments |
| Certification Level | Professional (highest tier) |
Domain Weights
| Domain | Weight |
|---|---|
| Domain 1: SDLC Automation | 22% |
| Domain 2: Configuration Management & Infrastructure as Code | 17% |
| Domain 3: Resilient Cloud Solutions | 15% |
| Domain 4: Monitoring & Logging | 15% |
| Domain 5: Incident & Event Response | 14% |
| Domain 6: Security & Compliance | 17% |
Study Tips
- Domain 1 (SDLC Automation) carries the most weight at 22%, so master the entire AWS Developer Tools suite including CodePipeline, CodeBuild, CodeDeploy, and CodeCommit inside and out
- CloudFormation and CDK are heavily tested in Domain 2; understand nested stacks, StackSets, drift detection, change sets, and custom resources thoroughly
- Know all deployment strategies (in-place, rolling, blue/green, canary, linear) and when to use each across EC2, Lambda, and ECS for Domain 1
- Master CloudWatch deeply for Domains 4 and 5: metrics, alarms, Logs Insights queries, dashboards, anomaly detection, and composite alarms
- Understand event-driven automation patterns using EventBridge, Lambda, and Systems Manager for automated remediation scenarios in Domain 5
- Security and compliance automation is critical for Domain 6: know Config rules, conformance packs, Security Hub, and pipeline security scanning
- Systems Manager is tested across multiple domains; know Parameter Store, Automation runbooks, Run Command, Patch Manager, State Manager, and Session Manager
- Practice with multi-account CI/CD patterns, cross-account deployments, and artifact sharing across accounts
Question Strategy Tips
- Questions are long and scenario-based; read the last sentence first to understand what is being asked, then read the full scenario
- Look for keywords like "least operational overhead", "fully automated", "self-healing", "minimize downtime", or "maximum visibility"
- DevOps exam strongly favors automation over manual processes; if an answer involves manual steps, it is likely wrong
- AWS managed services and native integrations are almost always preferred over third-party or custom solutions
- When two answers seem equally valid, pick the one that provides the most automation and least human intervention
- Pay attention to whether the question asks about EC2, Lambda, or ECS deployments because deployment strategies differ significantly across these compute types
- Flag complex questions and return later; do not spend more than 2.5 minutes per question on first pass
- Use the full 180 minutes; this exam rewards careful reading and deliberate elimination of wrong answers
Key Differences from SysOps Administrator & Solutions Architect Professional
- DevOps Professional focuses heavily on CI/CD pipelines, deployment automation, and infrastructure as code whereas SysOps focuses more on operational monitoring and management
- Unlike Solutions Architect Professional, DevOps Professional goes much deeper into deployment strategies, build automation, and testing methodologies
- SysOps tests manual troubleshooting skills while DevOps tests your ability to automate those same tasks using runbooks and event-driven responses
- DevOps Professional requires deep knowledge of CodePipeline, CodeBuild, CodeDeploy, and CodeCommit which are lightly covered in other exams
- Solutions Architect Professional focuses on architecture design and migration while DevOps Professional focuses on building and operating those architectures
- The DevOps exam expects you to know how to implement security controls in CI/CD pipelines including automated scanning, secrets management, and compliance as code
Recommended Preparation Path
- Step 1 - Foundation: Ensure you have a solid understanding of AWS developer tools and operations by completing either the Developer Associate (DVA-C02) or SysOps Administrator Associate (SOA-C02) before attempting the Professional exam
- Step 2 - Deep Dive: Study each domain in depth, focusing on CI/CD pipeline design, infrastructure as code with CloudFormation and CDK, and automated monitoring and remediation patterns
- Step 3 - Hands-On Labs: Build complete CI/CD pipelines with CodePipeline, create CloudFormation StackSets for multi-account deployments, set up automated remediation with EventBridge and Lambda, and implement monitoring dashboards
- Step 4 - Practice Exams: Take multiple full-length practice exams under timed conditions. Review every wrong answer thoroughly and understand why the correct answer is preferred
- Step 5 - Weak Areas: Identify your weakest domains from practice exams and dedicate additional study time to those areas before scheduling the real exam
Exam Day Checklist
- Arrive 15 minutes early for testing center or start your online proctored check-in 30 minutes before the scheduled time
- Bring two forms of valid identification (one with photo) for testing center; clear your workspace for online proctoring
- You have 180 minutes for 75 questions, which gives you approximately 2 minutes and 24 seconds per question
- Use the "Flag for Review" feature liberally on questions you are unsure about; you can return to them later
- Read every word in the scenario carefully as questions often contain critical constraints in the middle of the text
- Your score is calculated on a scale of 100-1000; you need 750 to pass, which means you need to answer approximately 72-75% correctly
- Results are typically available within 1-5 business days through your AWS Certification account
- If you do not pass, you can retake the exam after 14 days; there is no limit on the number of attempts
- Request accommodations in advance if English is not your first language (extra 30 minutes available for non-native speakers)
Recommended AWS Whitepapers & Resources
- Practicing Continuous Integration and Continuous Delivery on AWS: Covers CI/CD best practices, pipeline design patterns, and deployment strategies; directly relevant to Domain 1
- Infrastructure as Code: Best practices for CloudFormation, CDK, and managing infrastructure through version-controlled templates; essential for Domain 2
- Reliability Pillar - AWS Well-Architected Framework: Deep dive into fault tolerance, disaster recovery, and self-healing architectures; critical for Domain 3
- Logging Best Practices: Centralized logging architectures, CloudWatch Logs, and operational visibility patterns; important for Domain 4
- AWS Security Best Practices: Identity management, data protection, detective controls, and incident response automation; essential for Domain 6
Domain 1: SDLC Automation (22%)
This domain focuses on implementing and managing continuous integration and continuous delivery (CI/CD) pipelines on AWS. You must understand the complete AWS Developer Tools suite including CodePipeline, CodeBuild, CodeDeploy, and CodeCommit along with deployment strategies, testing automation, and artifact management. This is the highest-weighted domain on the exam, so thorough preparation here is essential for passing.
AWS CodePipeline
CodePipeline is a fully managed continuous delivery service that automates your release pipelines for application and infrastructure updates. It orchestrates the build, test, and deploy phases of your release process every time there is a code change.
| Component | Description | Key Details |
|---|---|---|
| Pipeline | Workflow that describes software release process | Defined as stages with actions; V2 supports triggers and variables |
| Stages | Logical groupings of actions | Source, Build, Test, Deploy, Approval; sequential execution |
| Actions | Tasks within a stage | Source, Build, Test, Deploy, Approval, Invoke; parallel or sequential within stage |
| Artifacts | Files passed between stages via S3 | Input and output artifacts; encrypted with AWS-managed or CMK |
| Transitions | Links between stages | Can be disabled to block pipeline execution between stages |
| Triggers | Events that start pipeline execution (V2) | Git tags, branch filtering, file path filtering for monorepos |
Cross-Account Pipeline Patterns
- Pipeline Account Model: Dedicated tooling account owns the pipeline; deploys to dev, staging, and production accounts using cross-account IAM roles
- Cross-Account Roles: Target accounts create IAM roles with trust policies allowing the pipeline account to assume them for deployment actions
- KMS Key Sharing: Artifact encryption key in the pipeline account must have key policy granting decrypt access to roles in target accounts
- S3 Bucket Policy: Artifact bucket must allow cross-account access for target account roles to read deployment artifacts
- EventBridge Cross-Account: Use EventBridge rules to trigger pipelines from events in other accounts via event bus forwarding
AWS CodeBuild
CodeBuild is a fully managed build service that compiles source code, runs tests, and produces software packages that are ready to deploy. It scales automatically and processes multiple builds concurrently.
| Buildspec Phase | Purpose | Key Details |
|---|---|---|
| install | Install dependencies and runtimes | runtime-versions block; install packages, tools, SDKs |
| pre_build | Commands before the build | Login to ECR, fetch secrets, run pre-checks |
| build | Core build commands | Compile code, run tests, build Docker images |
| post_build | Commands after the build | Push images to ECR, create reports, notify |
CodeBuild Key Features
- Caching: S3 caching for dependencies and Docker layer caching to speed up subsequent builds; local caching available for frequently accessed files
- Environment Variables: Plaintext variables, Parameter Store references, and Secrets Manager references; sensitive values should always use Secrets Manager
- VPC Access: CodeBuild can run inside a VPC to access private resources like RDS databases or internal APIs during build and test phases
- Build Badges: Dynamically generated images showing build status; embeddable in README files for visibility
- Report Groups: Test reports (JUnit, NUnit, Cucumber) and code coverage reports; viewable in the CodeBuild console
- Batch Builds: Run multiple builds in parallel using build-list, build-graph, or build-matrix configurations
AWS CodeDeploy Deployment Strategies
CodeDeploy automates application deployments to EC2 instances, on-premises servers, Lambda functions, and ECS services. Choosing the right deployment strategy is critical for minimizing risk and downtime.
| Strategy | How It Works | Rollback | Best For |
|---|---|---|---|
| In-Place (EC2) | Stops app, deploys new version, restarts on same instances | Redeploy previous revision | Dev/test, cost-sensitive workloads |
| Blue/Green (EC2) | Provisions new instances, shifts traffic via ELB, terminates old | Reroute traffic back to original (blue) instances | Production, zero-downtime deployments |
| Canary (Lambda/ECS) | Shifts a percentage of traffic first, then all traffic after interval | Automatic rollback on CloudWatch alarm trigger | Validating new versions with small traffic percentage |
| Linear (Lambda/ECS) | Shifts equal increments of traffic at regular intervals | Automatic rollback on CloudWatch alarm trigger | Gradual traffic migration with steady monitoring |
| All-at-Once (Lambda/ECS) | Shifts all traffic to new version immediately | Redeploy previous version | Non-critical workloads, fast deployments |
Deployment Strategy by Compute Platform
| Platform | Available Strategies | Key Consideration |
|---|---|---|
| EC2 / On-Premises | In-Place, Blue/Green | Blue/Green requires ELB; uses AppSpec hooks (BeforeInstall, AfterInstall, ApplicationStart, ValidateService) |
| AWS Lambda | Canary, Linear, All-at-Once | Uses alias traffic shifting; hooks are BeforeAllowTraffic and AfterAllowTraffic for validation |
| Amazon ECS | Canary, Linear, All-at-Once, Blue/Green | Blue/Green uses two target groups with ALB; test listener for validation before traffic shift |
Testing Automation in CI/CD
- Unit Tests: Run in CodeBuild during the build phase; fastest feedback loop, should catch most logic errors before deployment
- Integration Tests: Run in CodeBuild with VPC access to test against real dependencies like databases and APIs
- Approval Gates: Manual approval actions in CodePipeline between stages; SNS notifications to approvers with review URL
- Canary Testing: Use CodeDeploy canary deployments to validate with real production traffic before full rollout
- Synthetic Monitoring: CloudWatch Synthetics canaries run automated scripts to test endpoints after deployment
- Test Reports: CodeBuild integrates with JUnit, NUnit, Cucumber, and TestNG test report formats for visibility
Artifact Management
- CodeArtifact: Managed artifact repository for Maven, npm, pip, NuGet, and Swift packages; upstream repository chaining from public registries
- ECR: Managed Docker container registry with image scanning, lifecycle policies, cross-Region and cross-account replication
- S3 Artifacts: CodePipeline stores artifacts in S3 with encryption; versioning recommended for auditability
- Artifact Encryption: Pipeline artifacts encrypted with AWS managed key by default; use customer managed KMS key for cross-account access
Domain 2: Configuration Management & Infrastructure as Code (17%)
This domain covers defining and deploying infrastructure using code, managing configuration across fleets of resources, and maintaining consistency and compliance through automation. You must understand CloudFormation, CDK, Systems Manager, and configuration management tools to effectively provision, update, and maintain AWS environments at scale.
AWS CloudFormation Features
CloudFormation is the foundational IaC service on AWS. It allows you to model your entire infrastructure in a template file and provision it in a safe, repeatable manner.
| Feature | Description | Key Use Case |
|---|---|---|
| Nested Stacks | Stacks created as resources within other stacks | Reusable components (VPC, ALB); isolate lifecycle of common patterns |
| StackSets | Deploy stacks across multiple accounts and Regions | Organization-wide guardrails, baseline configurations, compliance resources |
| Drift Detection | Detect out-of-band changes to stack resources | Compliance auditing, identify manual console changes |
| Change Sets | Preview changes before executing stack updates | Review and approve infrastructure changes before deployment |
| Custom Resources | Lambda-backed logic for non-native resources | Third-party resources, complex provisioning logic, API calls during stack operations |
| Stack Policies | Protect specific resources from stack updates | Prevent accidental deletion of production databases or critical resources |
| Rollback Triggers | CloudWatch alarms that trigger automatic rollback | Monitor deployment health; rollback if error rate spikes |
| Import Resources | Bring existing resources under CloudFormation management | Adopt manually created resources into IaC without recreation |
CloudFormation Helper Scripts
- cfn-init: Reads metadata from AWS::CloudFormation::Init and executes configuration sets (packages, files, commands, services) on EC2 instances at launch
- cfn-signal: Signals CloudFormation that a resource (EC2 instance) has been successfully configured; used with WaitCondition and CreationPolicy
- cfn-hup: Daemon that detects metadata changes and re-runs cfn-init; enables in-place updates to running instances when the stack is updated
- CreationPolicy: Tells CloudFormation to wait for a signal from cfn-signal before marking the resource as CREATE_COMPLETE; configurable timeout
- UpdatePolicy: Controls how Auto Scaling group updates are handled: AutoScalingRollingUpdate, AutoScalingReplacingUpdate, or AutoScalingScheduledAction
AWS CDK Constructs
The AWS Cloud Development Kit (CDK) lets you define cloud infrastructure using familiar programming languages like TypeScript, Python, Java, C#, and Go. CDK synthesizes to CloudFormation templates for deployment.
| Construct Level | Description | Example |
|---|---|---|
| L1 (CFN Resources) | Direct mapping to CloudFormation resources with all properties | CfnBucket, CfnFunction; prefix is Cfn, requires all properties |
| L2 (Curated) | Higher-level abstractions with sensible defaults and helper methods | s3.Bucket, lambda.Function; includes grant methods and best practices |
| L3 (Patterns) | Multi-resource patterns representing common architectures | LambdaRestApi, ApplicationLoadBalancedFargateService; opinionated defaults |
CDK Pipelines
- Self-Mutating Pipeline: CDK Pipelines creates a CodePipeline that can update itself when the pipeline definition changes; the pipeline redeploys itself before deploying application changes
- Stages: Represent deployment environments (dev, staging, prod); each stage can deploy to different accounts and Regions
- cdk synth: Synthesizes CDK code into CloudFormation templates; this step runs in the pipeline to generate deployment artifacts
- cdk deploy: Deploys the synthesized CloudFormation templates; can target specific stacks or deploy all stacks
- cdk diff: Shows differences between the deployed stack and the current CDK code; useful for reviewing changes before deployment
AWS Systems Manager Components
Systems Manager provides a unified interface for managing your AWS resources. It is heavily tested across multiple domains of the DOP-C02 exam because it bridges configuration management, operations, and security.
| Component | Description | Key Use Case |
|---|---|---|
| Parameter Store | Hierarchical key-value store for config data and secrets | Store database URLs, feature flags, API keys; integrates with CloudFormation and CodeBuild |
| Automation | Execute runbooks for common maintenance and remediation tasks | Automated patching, AMI creation, incident remediation, multi-step workflows |
| Run Command | Execute commands on managed instances without SSH | Install software, run scripts, collect diagnostics at scale without bastion hosts |
| State Manager | Maintain consistent instance configuration over time | Ensure agents are installed, configurations remain compliant on schedule |
| Patch Manager | Automate OS and application patching | Patch baselines, maintenance windows, compliance reporting |
| Session Manager | Secure shell access to instances without SSH keys or bastion hosts | Audit-logged interactive access, no inbound security group rules needed |
Additional Configuration Management Services
- EC2 Image Builder: Automate AMI and container image creation with pipelines; define image recipes with components, test images, and distribute across accounts and Regions
- Service Catalog: Create and manage catalogs of approved CloudFormation products; enforce governance by allowing users to deploy only pre-approved templates with constraints
- OpsWorks: Managed Chef and Puppet for configuration management; uses recipes and cookbooks for EC2 fleet configuration (legacy approach, know for exam but AWS prefers Systems Manager)
- AppConfig: Feature flags and application configuration deployment with built-in validation and gradual rollout; integrates with Parameter Store and S3
Terraform on AWS Patterns
- State Management: Use S3 backend with DynamoDB locking for remote state; versioned S3 bucket with encryption for state file protection
- Workspaces: Isolate state files for different environments (dev, staging, prod) using the same configuration code
- Modules: Reusable infrastructure components similar to CloudFormation nested stacks; published in Terraform Registry or private registries
- CI/CD Integration: Run terraform plan in CodeBuild for review, terraform apply in deployment stage with manual approval; store plan output as artifact
- Exam Note: While DOP-C02 focuses on AWS-native tools, you may see questions about Terraform in the context of IaC best practices and CI/CD integration patterns
Domain 3: Resilient Cloud Solutions (15%)
This domain tests your ability to design and implement highly available, fault-tolerant, and self-healing architectures on AWS. You must understand disaster recovery strategies, multi-AZ and multi-Region patterns, Auto Scaling configurations, DNS-based routing, and chaos engineering principles to ensure applications can withstand failures and recover automatically.
Disaster Recovery Strategies
AWS defines four DR strategies with increasing cost and decreasing recovery time. Understanding the trade-offs between these strategies is essential for the exam.
| Strategy | Description | RPO | RTO | Cost |
|---|---|---|---|---|
| Backup & Restore | Backup data to S3, restore when needed; no running infrastructure in DR Region | Hours | Hours to days | Lowest |
| Pilot Light | Core components running at minimum (DB replicas); scale up on failover | Minutes | Tens of minutes | Low |
| Warm Standby | Scaled-down version of full environment running in DR Region | Seconds to minutes | Minutes | Medium |
| Multi-Site Active/Active | Full environment running in multiple Regions simultaneously | Near zero | Near zero | Highest |
Multi-AZ vs Multi-Region Patterns
| Pattern | Availability | Use Cases | Key Services |
|---|---|---|---|
| Multi-AZ | High availability within a Region | Most production workloads, AZ-level failure protection | RDS Multi-AZ, ALB cross-zone, ASG across AZs, ElastiCache Multi-AZ |
| Multi-Region Active/Passive | Region-level DR with failover | Business-critical apps requiring Region-level resilience | Aurora Global, DynamoDB Global Tables, S3 CRR, Route 53 failover |
| Multi-Region Active/Active | Highest availability, lowest latency globally | Global applications requiring local performance | DynamoDB Global Tables, CloudFront, Global Accelerator, Route 53 latency routing |
Auto Scaling Strategies
Auto Scaling automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Understanding the different scaling strategies is critical for the exam.
| Strategy | Description | Best For |
|---|---|---|
| Target Tracking | Maintains a specific metric at a target value (e.g., CPU at 50%) | Most common; simple, automatic scale-out and scale-in |
| Step Scaling | Scales by different amounts based on alarm threshold ranges | Variable scaling needs based on severity of metric breach |
| Simple Scaling | Single scaling adjustment when alarm triggers, then cooldown period | Basic scaling needs; has cooldown period preventing rapid changes |
| Scheduled Scaling | Scale based on known time-based patterns | Predictable load patterns like business hours or batch processing |
| Predictive Scaling | Uses machine learning to predict future demand and pre-scales | Cyclical traffic patterns; proactive capacity provisioning |
Route 53 Routing Policies & Health Checks
| Routing Policy | Description | DR Use Case |
|---|---|---|
| Failover | Routes to secondary when primary health check fails | Active/passive DR with automatic failover between Regions |
| Weighted | Distributes traffic by percentage across endpoints | Gradual traffic migration, canary testing between Regions |
| Latency | Routes to the Region with lowest latency for the user | Active/active multi-Region with performance-based routing |
| Geolocation | Routes based on user geographic location | Data residency compliance, Region-specific content |
| Multivalue Answer | Returns multiple healthy endpoints (up to 8) | Client-side load balancing with health check filtering |
Health Check Patterns
- Endpoint Health Checks: Monitor HTTP, HTTPS, or TCP endpoints; configurable threshold (default 3 consecutive failures), interval (10s or 30s), and string matching
- Calculated Health Checks: Combine multiple health checks with AND, OR, or threshold logic; useful for complex application health determination
- CloudWatch Alarm Health Checks: Base Route 53 health on CloudWatch alarm state; useful for private resources not directly reachable by Route 53 health checkers
- Cross-Region Health Checks: Route 53 health checkers run from multiple global locations; at least 18% of checkers must report healthy for the endpoint to be considered healthy
AWS Fault Injection Simulator (FIS)
FIS is a fully managed chaos engineering service that helps you test your application resilience by running controlled experiments that inject faults into your workloads.
- Experiment Templates: Define actions (fault injections), targets (resources), and stop conditions (CloudWatch alarms that halt the experiment if safety thresholds are breached)
- Supported Actions: Stop/terminate EC2 instances, inject CPU/memory/disk stress, disrupt network connectivity, throttle API calls, failover RDS, pause ECS tasks
- Stop Conditions: CloudWatch alarms that automatically stop the experiment if critical metrics exceed safety thresholds; essential for preventing unintended outages
- Integration: Run FIS experiments as part of CI/CD pipelines to validate resilience before production deployments; integrate with CodePipeline
- Best Practice: Start with small blast radius experiments in non-production environments, gradually increase scope, and always define stop conditions
Self-Healing Architecture Patterns
- Auto Scaling Health Checks: EC2 status checks and ELB health checks automatically replace unhealthy instances; configure grace period to avoid premature termination
- ECS Service Auto Recovery: ECS automatically replaces failed tasks; configure deployment circuit breaker to rollback if new tasks fail to stabilize
- RDS Multi-AZ Failover: Automatic failover to standby replica within 60-120 seconds; update DNS endpoint automatically
- Lambda Retry Behavior: Synchronous invocations return errors to caller; asynchronous invocations retry twice with exponential backoff; configure DLQ for failed events
- SQS-Based Decoupling: Use SQS queues between components to buffer requests during failures; DLQ captures messages that cannot be processed after maximum receive count
Domains 4 & 5: Monitoring, Logging (15%) & Incident Response (14%)
These two domains are closely related and together account for 29% of the exam. Domain 4 focuses on implementing monitoring, metrics, and logging systems, while Domain 5 focuses on automating incident detection and response. The key theme across both domains is automated observability and event-driven remediation, replacing manual intervention with self-healing automation.
Amazon CloudWatch Features
CloudWatch is the central monitoring and observability service on AWS. Understanding its full feature set is critical because it appears in questions across every domain of the DOP-C02 exam.
| Feature | Description | Key Details |
|---|---|---|
| Metrics | Time-series data points from AWS services and custom sources | Standard (5-min), Detailed (1-min), High-Resolution (1-sec); custom metrics via PutMetricData API |
| Logs | Centralized log collection, storage, and analysis | Log Groups, Log Streams; retention 1 day to 10 years; subscription filters for streaming |
| Alarms | Watch metrics and trigger actions when thresholds are breached | States: OK, ALARM, INSUFFICIENT_DATA; actions: SNS, Auto Scaling, EC2, Systems Manager |
| Composite Alarms | Combine multiple alarms with AND/OR logic | Reduce alarm noise; only alert when multiple conditions are true simultaneously |
| Logs Insights | Interactive query language for log analysis | SQL-like queries; filter, aggregate, visualize log data; save queries for reuse |
| Dashboards | Customizable visual displays of metrics and alarms | Cross-account and cross-Region dashboards; automatic and custom widgets |
| Anomaly Detection | Machine learning model for metric baselines | Automatically detects unusual metric behavior; creates dynamic alarm thresholds |
| Metric Filters | Extract metric data from log events | Turn log patterns into CloudWatch metrics; trigger alarms on log patterns |
AWS X-Ray
X-Ray provides distributed tracing for microservices architectures. It helps you understand how your application and its underlying services are performing to identify and troubleshoot the root cause of performance issues and errors.
- Traces: End-to-end request tracking across services; each trace contains segments (services) and subsegments (downstream calls)
- Service Map: Visual representation of your application architecture showing request flow, latency, and error rates between services
- Annotations: Key-value pairs that are indexed for filtering; use annotations to search and filter traces by custom criteria
- Metadata: Non-indexed key-value pairs for storing additional data; use metadata for detailed debugging information
- Sampling Rules: Control which requests are traced to manage cost; reservoir (fixed rate) plus fixed rate (percentage) sampling
- X-Ray Daemon: Runs on EC2 or ECS to collect trace data from the SDK and forward it to X-Ray service; Lambda has built-in integration
- Groups: Filter traces using expressions to create focused views; useful for isolating specific API paths or error patterns
Amazon EventBridge
EventBridge is the backbone of event-driven automation on AWS. It captures events from AWS services, SaaS applications, and custom sources, and routes them to targets for processing.
| Feature | Description | Key Use Case |
|---|---|---|
| Event Bus | Router that receives events and evaluates rules | Default bus for AWS events; custom bus for application events; cross-account event buses |
| Rules | Match events based on pattern and route to targets | Event patterns for filtering; schedule expressions for cron-like triggers |
| Targets | Services that receive and process matched events | Lambda, Step Functions, SQS, SNS, SSM Automation, CodePipeline, ECS tasks |
| Input Transformers | Transform event data before sending to target | Extract specific fields, reformat payloads, add context to notifications |
| Archive & Replay | Store events and replay them later | Debugging, disaster recovery, reprocessing missed events |
Automated Remediation Patterns
A core DevOps skill is replacing manual incident response with automated remediation. The exam heavily tests your ability to design event-driven workflows that detect issues and fix them without human intervention.
- Pattern 1 - Config Rule Remediation: AWS Config detects non-compliant resource → Config invokes SSM Automation document → Automation fixes the resource (e.g., enable S3 encryption, add required tags)
- Pattern 2 - CloudWatch Alarm Remediation: CloudWatch alarm triggers on high error rate → SNS notification → Lambda function restarts unhealthy instances or scales capacity
- Pattern 3 - EventBridge Event Response: EC2 state change event detected → EventBridge rule matches → Lambda function investigates and auto-resolves (e.g., re-attach EIP, update DNS)
- Pattern 4 - GuardDuty Finding Remediation: GuardDuty detects threat → EventBridge rule matches finding type → Step Functions orchestrates response (isolate instance, snapshot volume, notify security team)
- Pattern 5 - Health Event Automation: AWS Health event (scheduled maintenance, service issue) → EventBridge rule → Lambda proactively migrates instances or updates stakeholders
Systems Manager Automation Runbooks
- AWS-Managed Runbooks: Pre-built automation documents for common tasks like restarting instances, creating AMIs, patching, and updating CloudFormation stacks
- Custom Runbooks: YAML or JSON documents defining multi-step automated workflows with branching, approval steps, and error handling
- Rate Control: Execute runbooks against fleets of instances with concurrency limits and error thresholds to prevent cascading failures
- Cross-Account Execution: Run automation in target accounts using delegated administration; centralized operations account manages runbooks
- Change Calendar Integration: Block automation execution during defined change freeze periods; integrates with maintenance windows
Additional Logging & Monitoring Services
| Service | What It Logs | Key Details |
|---|---|---|
| CloudTrail | API calls across all AWS services | Management events (default), data events (S3/Lambda), Insights for unusual API activity; organization trail for all accounts |
| VPC Flow Logs | Network traffic metadata in VPCs | Capture at VPC, subnet, or ENI level; send to CloudWatch Logs or S3; analyze with Athena |
| Container Insights | ECS and EKS container metrics and logs | CPU, memory, network per task/pod; performance dashboards; enhanced observability with Prometheus metrics |
| Lambda Insights | Lambda function performance metrics | Cold starts, duration, memory usage, CPU time; Lambda layer extension |
| Application Insights | Application health monitoring | Automated dashboard creation, problem detection, correlated metric groups |
Centralized Logging Architecture
- Log Aggregation Account: Dedicated account receives all logs from member accounts via CloudWatch Logs cross-account subscriptions or S3 replication
- Subscription Filters: Stream log data in real-time to Kinesis Data Firehose, Lambda, or OpenSearch for processing and analysis
- S3 Log Archive: Long-term storage with lifecycle policies transitioning to Glacier; use Athena for ad-hoc queries against archived logs
- OpenSearch Service: Near real-time log search and visualization with Kibana dashboards; receives data via Kinesis Data Firehose
Domain 6: Security & Compliance (17%)
This domain tests your ability to implement security controls, governance processes, and compliance validation in automated DevOps workflows. You must understand how to embed security into CI/CD pipelines, manage secrets and encryption, automate compliance checking, and implement defense-in-depth strategies. The DevOps exam specifically focuses on security automation rather than manual security practices.
IAM for DevOps
Identity and Access Management is foundational to security on AWS. For the DevOps exam, you must understand how to implement least privilege, automate policy management, and secure CI/CD pipeline components.
| Concept | Description | DevOps Use Case |
|---|---|---|
| Service Roles | IAM roles assumed by AWS services | CodePipeline, CodeBuild, CodeDeploy, CloudFormation execution roles |
| Cross-Account Roles | Roles in target accounts with trust policies | Pipeline deploys to staging/production accounts via AssumeRole |
| Permission Boundaries | Maximum permissions an IAM entity can have | Delegate role creation to developers while limiting maximum permissions they can grant |
| SCPs | Organization-level maximum permissions per account | Prevent disabling CloudTrail, restrict Regions, enforce encryption |
| Session Policies | Inline policies passed when assuming a role | Further restrict permissions for specific pipeline actions |
| IAM Access Analyzer | Identifies resources shared with external entities | Detect unintended cross-account access, validate policies before deployment |
Secrets Manager vs Parameter Store
| Feature | Secrets Manager | Parameter Store (SecureString) |
|---|---|---|
| Auto Rotation | Built-in with Lambda; native for RDS, Redshift, DocumentDB | No built-in rotation; must implement custom Lambda |
| Cross-Account | Resource-based policy for cross-account access | No resource-based policy; requires cross-account role assumption |
| Cost | $0.40 per secret per month + API calls | Standard tier free; Advanced tier $0.05 per parameter per month |
| Encryption | Always encrypted with KMS | SecureString encrypted with KMS; String and StringList unencrypted |
| Best For | Database credentials, API keys requiring rotation | Configuration values, feature flags, non-rotating secrets |
KMS Key Types & Rotation
- AWS Managed Keys: Created and managed by AWS for specific services (aws/s3, aws/ebs); automatic rotation every year; you cannot manage rotation schedule or key policy
- Customer Managed Keys (CMK): You create and manage; configurable automatic rotation (every 1-365 days); custom key policies for fine-grained access control; cross-account sharing via key policy
- AWS Owned Keys: Used by AWS services internally; not visible in your account; no management or auditing capability
- Imported Key Material: Bring your own key material; no automatic rotation; you must manually rotate by creating new CMK and reimporting; supports key expiration
- Multi-Region Keys: Replicate keys across Regions for cross-Region encryption consistency; same key material in all Regions; useful for encrypting data that must be decrypted in another Region
- Rotation Best Practice: Enable automatic rotation for all customer managed keys; old key material is retained for decrypting previously encrypted data; rotation only generates new backing key
Security Scanning in CI/CD Pipelines
| Service | What It Scans | Pipeline Integration |
|---|---|---|
| ECR Image Scanning | Container images for OS and package vulnerabilities | Basic (on push) or Enhanced scanning (Inspector); fail pipeline on critical findings |
| Amazon Inspector | EC2 instances, container images, Lambda functions | Continuous scanning; EventBridge events for new findings; integrate results into pipeline gates |
| CodeGuru Reviewer | Source code for quality issues and security vulnerabilities | Automated code reviews in pull requests; detects secrets in code, concurrency issues, input validation gaps |
| CodeGuru Profiler | Application runtime performance and cost optimization | Identifies expensive code paths, CPU bottlenecks; recommendations for optimization |
| cfn-nag / cfn-lint | CloudFormation templates for security misconfigurations | Run in CodeBuild pre-deploy stage; catch insecure resource configurations before deployment |
AWS Config Rules & Conformance Packs
- Managed Rules: Pre-built rules for common compliance checks (s3-bucket-public-read-prohibited, encrypted-volumes, root-account-mfa-enabled); over 300 managed rules available
- Custom Rules: Lambda-backed rules for organization-specific compliance requirements; triggered by configuration changes or periodic evaluation
- Auto Remediation: Attach SSM Automation documents to non-compliant rules for automatic correction; configurable retry attempts and rate limiting
- Conformance Packs: Collection of Config rules and remediation actions packaged as a single entity; deploy organization-wide via StackSets; templates available for PCI-DSS, HIPAA, NIST
- Organization Rules: Deploy Config rules across all member accounts from the management or delegated administrator account
- Aggregator: Multi-account, multi-Region view of Config compliance data; create aggregated dashboards for organization-wide compliance posture
Security Hub, GuardDuty & Compliance Automation
| Service | Purpose | DevOps Integration |
|---|---|---|
| Security Hub | Aggregates security findings from multiple services | Centralized security posture; automated findings via EventBridge; compliance standards (CIS, PCI-DSS, NIST) |
| GuardDuty | Threat detection from VPC Flow Logs, DNS, CloudTrail | Findings to EventBridge for automated response; isolate compromised instances, revoke credentials |
| Macie | Discover and protect sensitive data in S3 | Scan for PII/PHI in data stores; findings to Security Hub; automate data classification |
| Detective | Investigate and analyze security findings | Root cause analysis of GuardDuty findings; visualize resource behavior graphs |
Common Security Patterns for DevOps
| Pattern | Implementation | Key Services |
|---|---|---|
| Shift-Left Security | Integrate security scanning early in the CI/CD pipeline before deployment | CodeGuru, cfn-nag, ECR scanning, SAST/DAST in CodeBuild |
| Secrets in Pipelines | Never hardcode secrets; use Secrets Manager or Parameter Store references | Secrets Manager, Parameter Store, CodeBuild env variables |
| Immutable Infrastructure | Replace instances with new AMIs rather than patching in-place | EC2 Image Builder, AMI pipelines, blue/green deployments |
| Encryption Everywhere | Enforce encryption at rest and in transit for all resources | KMS, ACM, Config rules, SCPs to deny unencrypted resources |
| Least Privilege Automation | Use IAM Access Analyzer to right-size permissions over time | Access Analyzer, CloudTrail analysis, permission boundaries |
| Compliance as Code | Define compliance rules in code and enforce automatically | Config rules, conformance packs, CloudFormation Guard, OPA |
| Automated Incident Response | Detect threats and respond automatically without human intervention | GuardDuty, EventBridge, Lambda, Step Functions, SSM Automation |
Certificate & TLS Management
- ACM (AWS Certificate Manager): Free public and private TLS certificates; automatic renewal for ACM-issued certificates; integrates with ALB, CloudFront, API Gateway
- ACM Private CA: Managed private certificate authority for internal services; issue certificates for mTLS, code signing, and internal endpoints
- Certificate Pinning: Pin specific certificates in applications for additional security; requires automated rotation process to avoid outages
- Exam Tip: When a question mentions HTTPS termination, the answer almost always involves ACM with ALB or CloudFront rather than self-managed certificates on EC2