Open Access

Automated Monitoring and Self-Healing Mechanisms in High-Availability Cloud Databases

4 Lead DBA, Take-Two Interactive Wesley Chapel, FL

Abstract

The article is dedicated to the analysis of automated monitoring and self-healing mechanisms in high-availability cloud databases operating under distributed, multilayer architectures. The relevance of the study is determined by the growing structural complexity of cloud native database clusters, where traditional threshold driven alerting fails to capture compound and metastable failure dynamics. The scientific novelty lies in the integrated interpretation of Graph-based anomaly localization, LLM-assisted diagnostic reasoning, adaptive concept drift detection, multivariate monitoring, and self-healing orchestration across layers as components of a unified distributed control regime. The work describes architectural solutions for topology-aware monitoring, recursive diagnosis, speculative recovery, and multi-cloud failover coordination. Special attention is paid to metastable instability and feedback amplification risks in autonomous remediation systems. The goal of the study is to systematize methodological approaches and identify structural regularities shaping resilient database infrastructures. Analytical synthesis, comparative source analysis, and structural modeling were used to achieve this goal. The conclusion demonstrates that availability emerges as a managed continuum formed by coordinated interpretive and corrective loops. The article will be useful for database architects, cloud engineers, and researchers in intelligent infrastructure systems.

Keywords

References

Boluda-Prieto, M., Mateo-Casali, M. A., Fraile, F., & Alarcon, F. (2026). Resilient edge-to-cloud architecture with self-healing and self-correcting mechanisms for industrial data continuity. Computers & Industrial Engineering, 213, 111795. https://doi.org/10.1016/j.cie.2025.111795
Brogi, A., Carrasco, J., Durán, F., & others. (2022). Self-healing trans-cloud applications. Computing, 104, 809–833. https://doi.org/10.1007/s00607-021-00977-z
Huang, L., Magnusson, M., Muralikrishna, A. B., Estyak, S., Isaacs, R., Aghayev, A., Zhu, T., & Charapko, A. (n.d.). Metastable failures in the wild. [Conference paper].
Li, N., Kalaba, A., Freedman, M. J., Lloyd, W., & Levy, A. (n.d.). Speculative recovery: Cheap, highly available fault tolerance with disaggregated storage. Princeton University. [Conference paper].
Rodriguez Sanchez, B., Giamattei, L., Guerriero, A., Pietrantuono, R., & Malavolta, I. (2026). Multivariate anomaly detection and root cause analysis of energy issues in microservice-based systems. Journal of Systems and Software, 231, 112626. https://doi.org/10.1016/j.jss.2025.112626
Wu, Q., Xiang, Q., Shao, Y., Luo, Q., & Xu, Q. (2025). DBPecker: A Graph-based compound anomaly diagnosis system for distributed RDBMSs. Proceedings of the VLDB Endowment, 18(12), 5383–5386. https://doi.org/10.14778/3750601.3750677
Xu, J., Lin, C., Liu, F., Wang, Y., Xiong, W., Li, Z., Guan, H., & Xie, G. (2023). StreamAD: A cloud platform metrics oriented benchmark for unsupervised online anomaly detection. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3(2), 100121. https://doi.org/10.1016/j.tbench.2023.100121
Zhang, W., Lim, W. S., Butrovich, M., & Pavlo, A. (2024). The Holon approach for simultaneously tuning multiple components in a self-driving database management system with machine learning via synthesized protoactions. Proceedings of the VLDB Endowment, 17(11), 3373–3387. https://doi.org/10.14778/3681954.3682007
Zhou, X., Li, G., Sun, Z., Liu, Z., Chen, W., Wu, J., Liu, J., Feng, R., & Zeng, G. (2024). D-Bot: Database diagnosis system using large language models. Proceedings of the VLDB Endowment, 17(10), 2514–2527. https://doi.org/10.14778/3675034.3675043
Zhu, J., Cai, S., Deng, F., Ooi, B. C., & Zhang, W. (2023). METER: A dynamic concept adaptation framework for online anomaly detection. Proceedings of the VLDB Endowment, 17(4), 794–807. https://doi.org/10.14778/3636218.3636233

Similar Articles

41-50 of 52

You may also start an advanced similarity search for this article.