Dynamic Deep Neural Network Partitioning For Low-Latency Edge-Assisted Video Analytics: A Learning-To-Partition Approach

Daniela Costa; Rafael Lima

doi:10.55640/

Authors

Daniela Costa Department of Artificial Intelligence, Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS), Porto Alegre, Brazil
Rafael Lima Institute of Data Science and Analytics, Universidade Federal de Pernambuco (UFPE), Recife, Brazil

DOI:

https://doi.org/10.55640/

Keywords:

Deep Neural Network Partitioning, Edge Computing, Low-Latency Video Analytics

Abstract

The rapid growth of real-time video analytics in surveillance, autonomous systems, and industrial automation has led to an increasing demand for efficient deep neural network (DNN) execution across edge–cloud infrastructures. Traditional cloud-based inference introduces latency and bandwidth bottlenecks, while fully edge-based processing struggles with limited computational capacity. To overcome these challenges, this study proposes a Learning-to-Partition (L2P) framework for dynamic DNN partitioning in edge-assisted environments. The proposed approach leverages reinforcement learning and gradient-based optimization to adaptively divide a neural network between edge and cloud nodes, minimizing end-to-end latency while maintaining high inference accuracy. Experimental evaluations conducted on benchmark video datasets and multiple network topologies demonstrate that the L2P framework achieves up to 38% latency reduction and 22% energy savings compared to static partitioning and heuristic-based methods. Moreover, the system dynamically adapts to fluctuating network bandwidth and heterogeneous edge resource availability, ensuring sustained performance under real-world conditions. This research contributes a scalable and intelligent partitioning strategy that advances the efficiency of edge-assisted video analytics for next-generation intelligent systems.

References

Xiao, Z.; Xia, Z.; Zheng, H.; Zhao, B.Y.; Jiang, J. Towards performance clarity of edge video analytics. In Proceedings of the 2021 IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 14–17 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 148–164.

Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.

Ananthanarayanan, G.; Bahl, P.; Bodík, P.; Chintalapudi, K.; Philipose, M.; Ravindranath, L.; Sinha, S. Real-time video analytics: The killer app for edge computing. Computer 2017, 50, 58–67.

Fang, W.; Xu, W.; Yu, C.; Xiong, N.N. Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters. ACM Trans. Internet Technol. (TOIT) 2022, 23, 7.

Chadha, K. S. (2025). Edge AI for Real-Time ICU Alarm Fatigue Reduction: Federated Anomaly Detection on Wearable Streams. Utilitas Mathematica, 122(2), 291–308. Retrieved from https://utilitasmathematica.com/index.php/Index/article/view/2708

Mohammed, T.; Joe-Wong, C.; Babbar, R.; Di Francesco, M. Distributed inference acceleration with adaptive DNN partitioning and offloading. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Virtual, 2–5 May 2022; IEEE: Piscataway, NJ, USA, 2020; pp. 854–863.

Matsubara, Y.; Levorato, M.; Restuccia, F. Split computing and early exiting for deep learning applications: Survey and research challenges. ACM Comput. Surv. 2022, 55, 1–30.

Zhao, K.; Zhou, Z.; Chen, X.; Zhou, R.; Zhang, X.; Yu, S.; Wu, D. EdgeAdaptor: Online Configuration Adaption, Model Selection and Resource Provisioning for Edge DNN Inference Serving at Scale. IEEE Trans. Mob. Comput. 2022, 22, 5870–5886.

Hu, C.; Li, B. Distributed inference with deep learning models across heterogeneous edge devices. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications, Virtual, 2–5 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 330–339.

Liu, J.; Gao, G. CSVA: Complexity-Driven and Semantic-Aware Video Analytics via Edge-Cloud Collaboration. In Proceedings of the International Conference on Wireless Artificial Intelligent Computing Systems and Applications, Tokyo, Japan, 24–26 June 2025; pp. 107–116.

Dong, C.; Hu, S.; Chen, X.; Wen, W. Joint optimization with DNN partitioning and resource allocation in mobile edge computing. IEEE Trans. Netw. Serv. Manag. 2021, 18, 3973–3986.

Chen, B.; Yan, Z.; Nahrstedt, K. Context-aware image compression optimization for visual analytics offloading. In Proceedings of the 13th ACM Multimedia Systems Conference, Athlone, Ireland, 14–17 June 2022; pp. 27–38.

Chandra, R. (2025). Security and privacy testing automation for LLM-enhanced applications in mobile devices. International Journal of Networks and Security, 5(2), 30–41. https://doi.org/10.55640/ijns-05-02-02

Jiang, J.; Luo, Z.; Hu, C.; He, Z.; Wang, Z.; Xia, S.; Wu, C. Joint model and data adaptation for cloud inference serving. In Proceedings of the 2021 IEEE Real-Time Systems Symposium (RTSS), Dortmund, Germany, 7–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 279–289.

Chandra, R. (2025). Reducing latency and enhancing accuracy in LLM inference through firmware-level optimization. International Journal of Signal Processing, Embedded Systems and VLSI Design, 5(2), 26–36. https://doi.org/10.55640/ijvsli-05-02-02

Zhang, H.; Ananthanarayanan, G.; Bodik, P.; Philipose, M.; Bahl, P.; Freedman, M.J. Live video analytics at scale with approximation and delay-tolerance. In Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation, Boston, MA, USA, 27–29 March 2017.

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444.

Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90.

Du, K.; Zhang, Q.; Arapin, A.; Wang, H.; Xia, Z.; Jiang, J. Accmpeg: Optimizing video encoding for video analytics. arXiv 2022, arXiv:2204.12534.

Liu, L.; Li, H.; Gruteser, M. Edge assisted real-time object detection for mobile augmented reality. In Proceedings of the 25th Annual International Conference on Mobile Computing and Networking, Los Cabos, Mexico, 21–25 October 2019; pp. 1–16.

Jiang, J.; Ananthanarayanan, G.; Bodik, P.; Sen, S.; Stoica, I. Chameleon: Scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, Budapest, Hungary, 20–25 August 2018; pp. 253–266.

Wang, X.; Gao, G. SmartEye: An Open Source Framework for Real-Time Video Analytics with Edge-Cloud Collaboration. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 3767–3770.

He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–90.

Gannavarapu, P. (2025). Performance optimization of hybrid Azure AD join across multi-forest deployments. Journal of Information Systems Engineering and Management, 10(45s), e575–e593. https://doi.org/10.55278/jisem.2025.10.45s.575

Chandra, R. (2025). Security and privacy testing automation for LLM-enhanced applications in mobile devices. International Journal of Networks and Security, 5(2), 30–41. https://doi.org/10.55640/ijns-05-02-02

Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674.

Kang, Y.; Hauswald, J.; Gao, C.; Rovinski, A.; Mudge, T.; Mars, J.; Tang, L. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput. Archit. News 2017, 45, 615–629.

Hu, C.; Bao, W.; Wang, D.; Liu, F. Dynamic adaptive DNN surgery for inference acceleration on the edge. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1423–1431.

Shao, J.; Zhang, J. Communication-computation trade-off in resource-constrained edge inference. IEEE Commun. Mag. 2020, 58, 20–26.

Gao, G.; Dong, Y.; Wang, R.; Zhou, X. EdgeVision: Towards collaborative video analytics on distributed edges for performance maximization. IEEE Transactions on Multimedia 2024, 26, 9083–9094.

Dong, Y.; Gao, G. EdgeCam: A Distributed Camera Operating System for Inference Scheduling and Continuous Learning. In Proceedings of the 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI), Hong Kong, China, 13–16 May 2024; pp. 225–226.

Ran, X.; Chen, H.; Zhu, X.; Liu, Z.; Chen, J. Deepdecision: A mobile deep learning framework for edge video analytics. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 15–19 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1421–1429.

Lulla, K.; Chandra, R.; & Ranjan, K. (2025). Factory-grade diagnostic automation for GeForce and data centre GPUs. International Journal of Engineering, Science and Information Technology, 5(3), 537–544. https://doi.org/10.52088/ijesty.v5i3.1089

Laskaridis, S.; Venieris, S.I.; Almeida, M.; Leontiadis, I.; Lane, N.D. SPINN: Synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, London, UK, 21–25 September 2020; pp. 1–15.

Lulla, K. L., Chandra, R. C., & Sirigiri, K. S. (2025). Proxy-based thermal and acoustic evaluation of cloud GPUs for AI training workloads. The American Journal of Applied Sciences, 7(7), 111–127. https://doi.org/10.37547/tajas/Volume07Issue07-12

Zeng, L.; Chen, X.; Zhou, Z.; Yang, L.; Zhang, J. Coedge: Cooperative dnn inference with adaptive workload partitioning over heterogeneous edge devices. IEEE/ACM Trans. Netw. 2020, 29, 595–608.

Tang, X.; Chen, X.; Zeng, L.; Yu, S.; Chen, L. Joint multiuser dnn partitioning and computational resource allocation for collaborative edge intelligence. IEEE Internet Things J. 2020, 8, 9511–9522.

Lulla, K. (2025). Python-based GPU testing pipelines: Enabling zero-failure production lines. Journal of Information Systems Engineering and Management, 10(47s), 978–994. https://doi.org/10.55278/jisem.2025.10.47s.978

Peng, S.; Shen, Z.; Zheng, Q.; Hou, X.; Jiang, D.; Yuan, J.; Jin, J. APT-SAT: An Adaptive DNN Partitioning and Task Offloading Framework within Collaborative Satellite Computing Environments. IEEE Trans. Netw. Sci. Eng. 2025.

Shao, J.; Zhang, J. Bottlenet++: An end-to-end approach for feature compression in device-edge co-inference systems. In Proceedings of the 2020 IEEE International Conference on Communications Workshops (ICC Workshops), Dublin, Ireland, 7–11 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–6.

Zhang, M.; Fang, J.; Teng, Z.; Liu, Y.; Wu, S. Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2025, 22, 2914–2927.

Liu, J.; Gao, G. CSVA: Complexity-Driven and Semantic-Aware Video Analytics via Edge-Cloud Collaboration. In Proceedings of the International Conference on Wireless Artificial Intelligent Computing Systems and Applications, Tokyo, Japan, 24–26 June 2025; pp. 107–116.

Li, H.; Hu, C.; Jiang, J.; Wang, Z.; Wen, Y.; Zhu, W. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 671–678.

Reddy Gundla, S. (2025). PostgreSQL Tuning for Cloud-Native Java: Connection Pooling vs. Reactive Drivers. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3479

Kumar Enugala, V. (2025). Quantum Sensors for Micro-Corrosion Detection. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3481

International Journal of Modern Computer Science and IT Innovations

Article Details Page