International Journal of Modern Computer Science and IT Innovations

  1. Home
  2. Archives
  3. Vol. 2 No. 11 (2025): Volume 02 Issue 11
  4. Articles
International Journal of Modern Computer Science and IT Innovations

Article Details Page

Bridging The Gap: A Strategic Framework for Integrating Site Reliability Engineering with Legacy Retail Infrastructure

Authors

  • Dr. Felicia S. Lee School of Computing, National University of Singapore, Singapore
  • Ivan A. Kuznetsov Faculty of Computer Science, Higher School of Economics, Moscow, Russia

DOI:

https://doi.org/10.55640/

Keywords:

Site Reliability Engineering (SRE), Legacy Systems, Retail Technology, IT Modernization, DevOps, Service Level Objectives (SLOs), Toil Automation

Abstract

Background: The retail sector faces intense pressure to ensure high availability and low latency, especially during peak traffic events. However, many established retailers operate on complex, monolithic legacy infrastructures that are inherently resistant to modern DevOps practices. Site Reliability Engineering (SRE), pioneered in cloud-native environments, offers a compelling model for managing reliability, yet its application in 'brownfield' legacy contexts is poorly understood.

Objectives: This study aims to (1) analyze the socio-technical friction points when implementing SRE principles within legacy retail organizations and (2) propose and evaluate a phased framework for this transition.

Methods: We employed a qualitative, multi-case study methodology, analyzing three anonymized retail organizations (grocery, e-commerce, department store) undergoing SRE adoption. Data was collected through 30 semi-structured interviews with engineering and leadership staff, supplemented by an analysis of internal documentation (postmortems, roadmaps, and monitoring data). We analyzed these cases through the lens of a proposed three-phase implementation framework: (1) Stabilize & Observe, (2) Automate & Abstract, and (3) Modernize & Scale.

Results: The findings indicate that the most significant barriers are cultural rather than technical, particularly the resistance to blameless postmortems and the adoption of error budgets. Defining meaningful Service Level Objectives (SLOs) for monolithic applications emerged as a complex initial hurdle. However, the study found that SRE-derived data (SLO breach reports, toil logs) provided a critical, objective language for prioritizing technical debt and de-risking modernization efforts, such as API abstraction and the introduction of new microservices.

Conclusion: SRE is a viable and necessary strategy for legacy retail, acting as a catalyst for incremental modernization. Successful adoption hinges on adapting SRE principles, prioritizing cultural change alongside technical automation, and using SRE metrics to bridge the divide between operations and development.

 

References

Allspaw, J. (2017). Blameless PostMortems and a Just Culture: A Guide to Incident Investigation. Etsy Engineering. https://codeascraft.com

Beyer, B., Jones, C., Petoff, J., & Murphy, N. R. (2016). Site Reliability Engineering: How Google Runs Production Systems. O’Reilly Media.

Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016). Borg, Omega, and Kubernetes. Communications of the ACM, 59(5), 50–57. https://doi.org/10.1145/2890784

Gartner. (2023). Predicts 2023: Legacy Systems Modernization Strategies for CIOs. Gartner Research.

Kumar Tiwari, S., Sooraj Ramachandran, Paras Patel, & Vamshi Krishna Jakkula. (2025). The Role of Chaos Engineering in Enhancing System Resilience and Reliability in Modern Distributed Architectures. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3885

Kim, G., Humble, J., Debois, P., & Willis, J. (2016). The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in Technology Organizations. IT Revolution Press.

Krief, M. (2019). Learning DevOps: Continuously Deliver Better Software. Packt Publishing.

OpenSLO. (2021). Open Specification for SLOs. https://openslo.com

Thongmak, M. (2022). Applying AI in IT Operations: Anomaly Detection and Incident Prediction in Legacy Systems. Journal of Information Technology Management, 33(1), 35–42.

Woodcock, S. (2020). Automating Legacy Systems: Practices and Pitfalls. IEEE Software, 37(4), 67–73. https://doi.org/10.1109/MS.2020.2996582

Zero-Trust Architecture in Java Microservices. (2025). International Journal of Networks and Security, 5(01), 202-214. https://doi.org/10.55640/ijns-05-01-12

Vikram Singh, 2025, Policy Optimization for Anti-Money Laundering (AML) Compliance using AI Techniques: A Machine Learning Approach to Enhance Banking Regulatory Compliance, INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 14, Issue 04 (April 2025)

Downloads

Published

2025-11-12

How to Cite

Bridging The Gap: A Strategic Framework for Integrating Site Reliability Engineering with Legacy Retail Infrastructure. (2025). International Journal of Modern Computer Science and IT Innovations, 2(11), 1-11. https://doi.org/10.55640/

How to Cite

Bridging The Gap: A Strategic Framework for Integrating Site Reliability Engineering with Legacy Retail Infrastructure. (2025). International Journal of Modern Computer Science and IT Innovations, 2(11), 1-11. https://doi.org/10.55640/

Similar Articles

1-10 of 29

You may also start an advanced similarity search for this article.