International Journal of Modern Computer Science and IT Innovations

  1. Home
  2. Archives
  3. Vol. 2 No. 11 (2025): Volume 02 Issue 11
  4. Articles
International Journal of Modern Computer Science and IT Innovations

Article Details Page

Beyond Hyperscale: The Socio-Technical Adaptation of Site Reliability Engineering for Enhanced Resilience in Critical Infrastructure

Authors

  • Svetlana Petrova Faculty of Computer Science, Universitas Indonesia, Depok, Indonesia

DOI:

https://doi.org/10.55640/

Keywords:

Site Reliability Engineering, DevOps, Financial Services,, Healthcare Systems, Telecommunications, Error Budgets, System Resilience

Abstract

Purpose: This article examines the specialized and contextual application of Site Reliability Engineering (SRE) principles across high-impact industries: Financial Services, Healthcare Systems, and Telecommunications. It addresses the gap in existing literature by providing a multi-sectoral, comparative analysis, moving beyond SRE's origins in hyper-scale technology companies.

Methodology: A conceptual synthesis and structured literature review methodology were employed, analyzing foundational SRE literature, complementary DevOps practices, and specific industry compliance and risk documentation. The analysis is framed by a socio-technical systems perspective, focusing on how unique sector demands—namely stringent regulation, legacy infrastructure, and catastrophic failure potential—mandate adaptive SRE strategies.

Findings: The core SRE tenets of Error Budget Management, Toil Quantification, and Systematic Post-Mortems are universally applicable yet require distinct interpretation based on sectoral risk. Financial Services prioritize transaction integrity and regulatory SLOs, Healthcare Systems emphasize patient safety and data security (HIPAA/GDPR), while Telecommunications focuses on massive-scale latency and network throughput optimization in hybrid cloud environments. Crucially, the Error Budget acts as a risk management tool that must be culturally accepted and technically integrated into hybrid environments. The socio-technical paradox of 'embracing risk' in risk-averse settings is mitigated by reframing the Error Budget as a learning mechanism, supported by blameless post-mortems.

Originality: This work proposes a structured model for understanding SRE's adaptive implementation in traditionally risk-averse, highly regulated sectors. It underscores the critical distinction between operational availability and compliance/safety-driven resilience, demonstrating that SRE is an essential component of digital transformation that must be customized to meet specific legal and human-impact imperatives. Future work is associated with extending SRE principles to MLOps reliability and quantitative analysis of socio-technical drivers.

 

References

B. Beyer, C. Jones, J. Petoff, and N. R. Murphy, "Site Reliability Engineering: How Google Runs Production Systems," O'Reilly Media, 2016. [Online]. Available: https://research.google/pubs/site-reliability-engineering-how-google-runs-production-systems/

T. A. Limoncelli, "The Practice of Cloud System Administration: DevOps and SRE Practices for Web Services, Volume 2," Addison-Wesley Professional, 2014. [Online]. Available: https://www.informit.com/store/practice-of-cloud-system-administration-devops-and-sre-9780321943187

D. F. Sittig and H. Singh, "A Socio-technical Approach to Preventing, Mitigating, and Recovering from Ransomware Attacks," Applied Clinical Informatics, vol. 7, no. 2, pp. 624-632, 2016. [Online]. Available:https://pubmed.ncbi.nlm.nih.gov/27437066/

Healthcare Information and Management Systems Society (HIMSS), "2021 HIMSS Healthcare Cybersecurity Survey," 2021. [Online]. Available: https://www.himss.org/sites/hde/files/media/file/2022/01/28/2021_himss_cybersecurity_survey.pdf

Bank for International Settlements, "BIS Annual Economic Report 2021," June 2021. [Online]. Available: https://www.bis.org/publ/arpdf/ar2021e.pdf

European Central Bank, "The digital transformation of the retail payments ecosystem," 2021. [Online]. Available: https://www.ecb.europa.eu/press/key/date/2017/html/ecb.sp171130.en.html

L. Bass, I. Weber, and L. Zhu, "DevOps: A Software Architect's Perspective," Addison-Wesley Professional, 2015. [Online]. Available: https://www.informit.com/store/devops-a-software-architects-perspective-9780134049847

B. Beyer, N. R. Murphy, D. K. Rensin, K. Kawahara, and S. Thorne, "The Site Reliability Workbook: Practical Ways to Implement SRE," O'Reilly Media, 2018. [Online]. Available: https://books.google.co.in/books/about/The_Site_Reliability_Workbook.html?id=fElmDwAAQBAJ&redir_esc=y

Sagar Kesarpu. (2025). Contract Testing with PACT: Ensuring Reliable API Interactions in Distributed Systems. The American Journal of Engineering and Technology, 7(06), 14–23. https://doi.org/10.37547/tajet/Volume07Issue06-03

M. Natu, R. K. Ghosh, R. K. Shyamsundar, and R. Ranjan, "Holistic Performance Monitoring of Hybrid Clouds: Complexities and Future Directions," IEEE Cloud Computing, vol. 3, no. 1, pp. 72-81, 2016. [Online]. Available: https://ieeexplore.ieee.org/abstract/document/742051

Rajgopal, P. R., & Karanam, L. (2025). MDR service design: Building profitable 24/7 threat coverage for SMBs. International Journal of Applied Mathematics, 38(2s). https://doi.org/10.12732/ijam.v38i2s.711

Kumar Tiwari, S., Sooraj Ramachandran, Paras Patel, & Vamshi Krishna Jakkula. (2025). The Role of Chaos Engineering in Enhancing System Resilience and Reliability in Modern Distributed Architectures. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3885

Downloads

Published

2025-11-12

How to Cite

Beyond Hyperscale: The Socio-Technical Adaptation of Site Reliability Engineering for Enhanced Resilience in Critical Infrastructure. (2025). International Journal of Modern Computer Science and IT Innovations, 2(11), 12-20. https://doi.org/10.55640/

How to Cite

Beyond Hyperscale: The Socio-Technical Adaptation of Site Reliability Engineering for Enhanced Resilience in Critical Infrastructure. (2025). International Journal of Modern Computer Science and IT Innovations, 2(11), 12-20. https://doi.org/10.55640/

Similar Articles

21-27 of 27

You may also start an advanced similarity search for this article.