Adaptive Chaos Engineering and AI-Driven Dependability Modeling for Resilient Cloud-Native and Safety-Critical Systems

Mateo Laurent Dubois

Open Access

Adaptive Chaos Engineering and AI-Driven Dependability Modeling for Resilient Cloud-Native and Safety-Critical Systems

PDF

Mateo Laurent Dubois ¹ ,

⁴ Department of Computer Science, University of Lyon, France

Abstract

The increasing reliance on cloud-native architectures, serverless computing, and artificial intelligence-driven systems has introduced new complexities in ensuring system dependability, resilience, and safety. Traditional reliability engineering approaches, while foundational, are often insufficient in addressing the dynamic, distributed, and failure-prone nature of modern cloud ecosystems. This research presents a comprehensive, theoretically grounded framework that integrates chaos engineering, machine learning-based reliability modeling, and human-centered safety principles to enhance system robustness across cloud-native and safety-critical domains, including healthcare and autonomous systems.

The study synthesizes interdisciplinary perspectives from cloud computing, dependability engineering, fault injection methodologies, and AI-based safety analysis. It explores how experimental fault injection, particularly through chaos engineering practices, can be combined with predictive analytics to proactively identify and mitigate system vulnerabilities. Furthermore, the research emphasizes the importance of realism in error injection, the role of serverless architectures in resilience testing, and the integration of human factors in safety-critical environments.

A qualitative, theory-driven methodology is employed to construct a unified framework that bridges gaps between cloud system resilience and safety engineering in domains such as healthcare. The findings suggest that integrating chaos engineering with machine learning enhances predictive fault detection, improves failure propagation understanding, and supports adaptive system recovery mechanisms. Additionally, the study highlights that human-centered design and error taxonomy integration significantly contribute to reducing systemic risks in critical infrastructures.

The proposed framework offers a novel contribution by aligning chaos engineering practices with AI-driven reliability assessment and safety assurance principles. It provides a scalable and adaptable approach for organizations seeking to build resilient, trustworthy, and high-performance systems in increasingly complex technological landscapes.

Keywords

Chaos engineering, Cloud-native systems, Fault injection, Reliability engineering

References

📄 Abrahamsen, H.B. et al. (2016). On the need for revising healthcare failure mode and effect analysis for assessing potential for patient harm in healthcare processes. Reliability Engineering and System Safety.

📄 Armbrust, M. et al. (2010). A view of cloud computing. Communications of the ACM.

📄 Gursel, E. et al. (2025). The role of AI in detecting and mitigating human errors in safety-critical industries: a review. Reliability Engineering and System Safety.

📄 Herbst, N. et al. (2018). Quantifying cloud performance and dependability: taxonomy, metric design, and emerging challenges. ACM Transactions on Modeling and Performance Evaluation of Computing Systems.

📄 Herscheid, L., Richter, D., & Polze, A. (2015). Experimental assessment of cloud software dependability using fault injection. Springer.

📄 Jaival, M., Mkrtchyan, K., & Kaplan, A. (2022). Serverless cloud functions-opportunity in chaos. IEEE.

📄 Sagar Kesarpu. (2025). Chaos Engineering as a Learning Framework: A Human-Centered Model for Developing High-Reliability Engineering Teams. The American Journal of Engineering and Technology, 7(12), 57–64. https://doi.org/10.37547/tajet/Volume07Issue12-05

📄 Kounev, S. et al. (2012). Providing dependability and resilience in the cloud: challenges and opportunities. Springer.

📄 Kratzke, N., & Quint, P.-C. (2017). Understanding cloud-native applications after 10 years of cloud computing-a systematic mapping study. Journal of Systems and Software.

📄 Lin, S., Wang, Y., & Jia, L. (2018). System reliability assessment based on failure propagation processes. Complexity.

📄 Paterson, C. et al. (2025). Safety assurance of machine learning for autonomous systems. Reliability Engineering and System Safety.

📄 Scheuner, J., & Leitner, P. (2020). Function-as-a-service performance evaluation: a multivocal literature review. Journal of Systems and Software.

📄 Singh, A. et al. (2024). Patient centric trustworthy AI in medical analysis and disease prediction: a comprehensive survey and taxonomy. Applied Soft Computing.

📄 Taib, I.A. et al. (2011). A review of medical error taxonomies: a human factors perspective. Safety Science.

📄 Xu, Z. et al. (2021). Machine learning for reliability engineering and safety applications: review of current status and future opportunities. Reliability Engineering and System Safety.

📄 Zhang, L. et al. (2021). Maximizing error injection realism for chaos engineering with system calls. IEEE Transactions on Dependable and Secure Computing.

Similar Articles

Alejandro M. Cortés, A Profit-Oriented and Machine Learning–Driven Framework for Advancing Credit Risk Prediction in Modern Financial Systems , International Journal of Next-Generation Engineering and Technology: Vol. 2 No. 09 (2025): Volume 02 Issue 09
Dr. Alejandro M. Cortés, Climate Vulnerability, Environmental Change, and Adaptive Pathways: Integrating Biodiversity, Agriculture, Water, Energy, Urban Systems, and Human Mobility in a Warming World , International Journal of Next-Generation Engineering and Technology: Vol. 3 No. 01 (2026): Volume 03 Issue 01
Dr. Leila Karam, INNOVATIVE STRATEGIES IN MODERN DATA WAREHOUSING: INTEGRATING LAKEHOUSE ARCHITECTURES AND ENTERPRISE DATA PIPELINES , International Journal of Next-Generation Engineering and Technology: Vol. 2 No. 12 (2025): Volume 02 Issue 12
Dr. Alistair Sterling, The Convergence of Graph-Theoretic Architectures and Agentic Artificial Intelligence in Optimizing Multi-Cloud Ecosystems: A Comprehensive Analysis of Cost Dynamics and Resource Allocation , International Journal of Next-Generation Engineering and Technology: Vol. 3 No. 01 (2026): Volume 03 Issue 01
Ismoyilov Diyorbek Bektemir og’li, Fayzillayeva Oykhon Qodir qizi, Esanova Dilsinoy Dilmurod qizi, Artificial Intelligence Today And In The Future , International Journal of Next-Generation Engineering and Technology: Vol. 3 No. 01 (2026): Volume 03 Issue 01
Dr. Mateo Alvarez, INTEGRATED ENVIRONMENTAL IMPACT AND PREDICTIVE ANALYTICS FRAMEWORK FOR OFFSHORE DRILLING DISCHARGES AND BENTHIC ECOSYSTEM INTEGRITY , International Journal of Next-Generation Engineering and Technology: Vol. 3 No. 02 (2026): Volume 03 Issue 02
Dr. Arjun Prakash Nair, Dr. Nurul Syafiqah Binti Hassan, Prof. Chen Wei Liang, CAPACITANCE BIOSENSORS FOR THE RAPID DETECTION OF ESCHERICHIA COLI IN WATER , International Journal of Next-Generation Engineering and Technology: Vol. 1 No. 01 (2024): Volume 01 Issue 01
Dr. Juan Carlos Rivera, HYDRAULIC FRACTURING IN OIL AND GAS WELLS: TECHNIQUES, INNOVATION, AND ENVIRONMENTAL IMPACTS , International Journal of Next-Generation Engineering and Technology: Vol. 2 No. 01 (2025): Volume 02 Issue 01
Prof. Jonathan Hayes, Dr. Lucas Pereira, NANOROBOTIC TECHNOLOGIES IN SURGERY: THE NEXT FRONTIER IN MINIMALLY INVASIVE MEDICINE , International Journal of Next-Generation Engineering and Technology: Vol. 2 No. 02 (2025): Volume 02 Issue 02
Dr. Javad Ahmadi, Dr. Yingjie Zhao, OPTIMIZING ELECTRIC VEHICLE CHARGING INFRASTRUCTURE: A MULTI-OBJECTIVE GENETIC ALGORITHM APPROACH FOR SITING AND SIZING , International Journal of Next-Generation Engineering and Technology: Vol. 2 No. 03 (2025): Volume 02 Issue 03

Previous 41-50 of 55 Next

You may also start an advanced similarity search for this article.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.

International Journal of Next-Generation Engineering and Technology

Adaptive Chaos Engineering and AI-Driven Dependability Modeling for Resilient Cloud-Native and Safety-Critical Systems

Abstract

Keywords

References

Similar Articles