Gremlin
Everything you need to safely, securely, and simply build reliable software through Chaos Engineering. Use Gremlin's comprehensive set of failure modes to experiment across your system, including bare metal, any cloud provider, containerized environments, kubernetes, applications, and serverless. Throttle CPU, Memory, I/O, and Disk. Reboot hosts, kill processes, travel in time. Introduce latency, blackhole traffic, lose packets, fail DNS. Test for failure in your code. Fail or delay serverless functions. Narrow the impact to a single user, device, or percentage of traffic.
Learn more
Amazon CloudWatch
Amazon CloudWatch is a monitoring and observability service built for DevOps engineers, developers, site reliability engineers (SREs), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications, and services that run on AWS and on-premises servers. You can use CloudWatch to detect anomalous behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot issues, and discover insights to keep your applications. CloudWatch alarms watch your metric values against thresholds that you specify or that it creates using ML models to detect anomalous behavior.
Learn more
Steadybit
With our experiment editor, your journey toward reliability is faster and easier, everything is at your fingertips, and you have full control over your experiments. All are meant to help you achieve your goals and roll out chaos engineering safely at scale in your organization. You can add new targets, attacks, and checks by implementing extensions inside Steadybit. A unique discovery and selection process makes it easy to pick the targets. Remove friction when collaborating between teams, and export and import experiments using JSON or YAML. Using Steadybit's landscape, you can see your software's dependencies and relationships between components, the perfect start to kick off your chaos engineering journey. Using the powerful query language, divide your system(s) into different environments based on the same information you use elsewhere. Explicitly assigning environments to specific users and teams in which they're allowed to work and prevent unwanted damages.
Learn more
Azure Chaos Studio
Improve application resilience with chaos engineering and testing by deliberately introducing faults that simulate real-world outages. Azure Chaos Studio is a fully managed chaos engineering experimentation platform for accelerating the discovery of hard-to-find problems, from late-stage development through production. Disrupt your apps intentionally to identify gaps and plan mitigations before your customers are impacted by a problem. Experiment by subjecting your Azure apps to real or simulated faults in a controlled manner to better understand application resilience. Observe how your apps will respond to real-world disruptions such as network latency, an unexpected storage outage, expiring secrets, or even a full data center outage with chaos engineering and testing. Validate product quality when and where it makes sense for your organization. Take advantage of a hypothesis-based approach to drive application resilience with integrated chaos in your CI/CD pipeline.
Learn more