The AWS Well-Architected Framework Review (WAFR) defines architectural best practices for building and running cloud workloads. Organizations identify and mitigate high-risk issues (HRIs) during reviews.
In this blog, I will explain HRIs and a milestones-based approach in evaluating and remediating insights for the Security Pillar of AWS Well-Architected Framework. I will also show you how you can perform all this in just a few clicks.
High Risk Issues (HRIs)
HRIs are potential issues that can have a significant impact on cloud architecture’s reliability, security, efficiency, and cost. Organizations should address HRIs early to prevent issues from becoming more serious and expensive to remediate. By identifying and addressing high-risk issues as part of the AWS Well-Architected Review process, organizations can improve the overall quality and effectiveness of their systems.
Below, I share examples of HRIs that might be identified during an AWS Well-Architected Review.
- Security vulnerabilities: These can include problems with access control, data encryption, or network security that could expose sensitive data or systems to unauthorized access.
- Performance bottlenecks: These can include issues with resource allocation, database design, or network configuration that could limit the system’s ability to scale or handle high levels of traffic.
- Operational issues: These can include problems in monitoring or alerting, lack of automation, poor change management practices that could lead to downtime or data loss, difficulty maintaining the system in an efficient and reliable manner etc.
- Cost inefficiencies: These can include problems with resource utilization, data storage, or service selection that could lead to unnecessary expenses.
- Reliability issues: These can include problems with fault tolerance, monitoring, or recovery strategies that could lead to downtime or data loss.
The review process also involves a series of milestones, or key points of progress, that help organizations to track their progress and identify areas for improvement.
For example, when improving your workload’s security posture, you will first run an assessment and create a milestone that provides a baseline for the current security state of your workload. After this, your teams may prioritize and remediate HRIs that could be accommodated within a sprint. From here on, you can create new milestones to track progress towards remediating all identified issues over time.
Here are the key reasons that highlight the importance of milestones:
- Milestones provide a structured approach to reviewing the architecture of a system: By following a defined set of milestones, organizations can ensure that they are considering all relevant aspects of their system’s architecture and identifying potential issues.
- Milestones help organizations to track their progress: By breaking the review process down into manageable stages, organizations can more easily track their progress and identify areas where they may need to focus additional effort.
- Milestones facilitate continuous improvement: By regularly reviewing their systems and identifying areas for improvement, organizations can continuously evolve their architectures to better meet their needs and goals.
By following milestones as part of the AWS WAFR process, organizations can methodically improve the overall quality and effectiveness of their systems.
The AWS Well-Architected Security Pillar
The AWS Well-Architected Security Pillar largely focuses on the capability to safeguard data, systems, and assets. Organizations have varying business needs that drive regulatory and compliance requirements, security best practices, and processes. Given the nature of dynamic cloud resources and changing configurations, it is often hard to follow all recommended security design principles and even harder to keep them up to date. However, it is crucial that organizations identify and assess their cloud infrastructure and resource configuration to achieve and maintain the desired security posture.
Here are a set of Security Pillar design principles that I recommend to our customers
- Implement a strong identity foundation
- Enable traceability
- Apply security at all layers
- Automate security best practices
- Protect data in transit and at rest
- Keep people away from data
- Prepare for security events
To verify if your cloud environment adheres to security best practices, you can follow the Security Pillar of AWS Well-Architected principles questions and select respective options as answers.
Like I mentioned earlier, it is both cumbersome and time consuming to perform the required checks to complete WAFR assessments. The Security Pillar has several critical questions that will need diligent review of each resource in your environment. Reviewers often face the following challenges:
- Manual efforts to detect vulnerabilities for each question from the Security Pillar
- No clarity on the security risk status
- Limited cloud security knowledge
- Difficulties in identifying the right resources that correspond to a workload hosted within the same or more AWS accounts since tagging policies are not often implemented accurately
- The time it takes to check the resource configurations might range from hours to months, depending on the workload size
MontyCloud DAY2™ simplifies and accelerates security reviews with automation
With highly automated security checks, MontyCloud DAY2™ WAFR assessment helps you analyze the state of your applications and workloads against architectural security best practices. Now you can quickly identify areas for improvement and pinpoint cloud resources that have security deficiencies across 55+ AWS Services.
Cloud Well-Architected Security Assessment Process
Now I will show you how you can complete an assessment using MontyCloud DAY2™. As with all WAFRs, you will first need to define your workload. A key advantage with MontyCloud DAY2™ is that you can also target a custom set of resources that encompasses your workload. For example, you can select a set of resources and group them into a MontyCloud Managed Environment. You can also create several such Environments in a single Project. Now you can target any combination of Environment(s) or Project(s) to perform your WAFR.
MontyCloud DAY2™ runs 400+ automated security checks against 55+ AWS services, in minutes!
Immediately after defining workloads, MontyCloud DAY2™ Automated Checks run across all pillars, including Security Pillar.
You can run automated checks, on demand, at Security Pillar and its question level by clicking on the Action button “Re-run Checks”.
When the checks are running, the status of the Autonomous Checks will be changed to:
The beauty of automated security checks is that they provide evidence of resource configurations before you go through the Security Questions. This helps you understand the security issues and select the appropriate answers from the options provided.
You can see the automated checks result by clicking on the “View Details” button.
You can see the status of the checks in green and red color, if a check shows up with a green tick, it means all resources belonging to the specific resource type check runs for are adhering to the security best practice of that check, or there’s no resource found to run check for.
If the check status is Red, it means one or more resource configurations is not meeting the expected requirement per the check standard. You can also further expand the check finding to quickly view your resource configurations.
For example, the check “AWS Elastic Block Store (EBS) volume Public snapshots” looks for public snapshots within the specified workload scope and its result is having one snapshot public.
MontyCloud DAY2™ further facilitates addressing the security finding, either you can suppress the finding in case your resource configuration setup is as per your design, or you can remediate it by leveraging remediation tasks.
In this example, let’s see how remediation works.
Step 1: Navigate to a finding and click on the ‘Remediate’ button
Step 2: In the workflow, you can select a recommended remediation task which is served up from the MontyCloud DAY2™ tasks library. This task is an automated playbook that will remediate the issue. Optionally, if your remediation requires additional configuration or custom workflows, you can use your own custom scripts written in Python or AWS CLI by directly importing it into MontyCloud DAY2™ and pick that task from the list.
Step 3: Complete the remediation by following the task workflow.
Once the remediation task is complete, you can re-run checks to see the updated status of your resource configuration.
Here’s what a remediated finding will look like. In this example, the status of the finding we remediated in step 2 is now ‘Passed’, indicating that the remediation worked successfully.
At this point, you can export the reports. It generates two reports.
The first report is the ‘AWS Well-Architected Review Report’ which is in the official PDF format as required by Amazon AWS.
The second report provides incremental value from MontyCloud in a CSV format. This report contains the list of all resources that failed the automated checks. Please note while MontyCloud runs over 400 checks, the findings could be a lot more as this report details each resource granularly. You can use this report to pinpoint the specific resource in any of the target account and regions that were part of the workload’s scope. We recommend itemizing every insight and prioritizing next steps such as creating corresponding tickets for your teams to review and take action.
To perform an effective WAFR and reduce HRIs, you will need a comprehensive understanding of the environment, a high degree of expertise in AWS services and best practices, in addition to several days of dedicated effort.
MontyCloud DAY2™ simplifies this by automating the checks against best practices. Now you can easily check the security status of your resources, gather insights and evidence for review, save milestones, and track progress. The no-code remediation playbooks make it simple to act on the insights, and the tool can also help you choose the right options as answers to WAFR questions in subsequent reviews. As a result, your team can rapidly identify issues, particularly the HRIs that have a big impact and remediate them, all within just a few clicks.
I hope this post was helpful and helped you to understand how to rapidly identify HRIs and improve the security posture of your workloads using MontyCloud DAY2™.