A COMPREHENSIVE REVIEW AND EMPIRICAL ASSESSMENT OF DATA AUGMENTATION TECHNIQUES IN TIME-SERIES CLASSIFICATION
DOI:
https://doi.org/10.55640/ijmcsit-v02i07-01Keywords:
Time-Series Classification, Data Augmentation, Empirical EvaluationAbstract
Time-series data is ubiquitous across various domains, from healthcare and finance to industrial monitoring and human activity recognition. The accurate classification of such data is crucial for informed decision-making and automated systems. However, a common challenge in developing robust time-series classification models, especially deep learning-based ones, is the scarcity of sufficiently large and diverse labeled datasets. Data augmentation has emerged as a powerful technique to address this limitation by synthetically expanding the training data, thereby enhancing model generalization and reducing overfitting. While data augmentation has been extensively studied in domains like image processing and natural language processing, its application and effectiveness in time-series classification present unique challenges and opportunities. This article provides a comprehensive survey of existing data augmentation techniques specifically tailored for time-series classification. Furthermore, it synthesizes empirical findings from a wide range of studies, discussing the efficacy of different augmentation strategies across various datasets and model architectures. We categorize augmentation methods, analyze their underlying principles, and highlight their impact on classification performance. Finally, we identify current limitations and propose future research directions to foster the development of more effective and universally applicable time-series data augmentation methodologies.
References
Connor Shorten and Taghi M Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of big data 6, 1 (2019), 1–48.
A Jung et al. 2017. imgaug: Image augmentation for machine learning experiments. p. Accessed 3 (2017), 977–997.
G Forestier, J Weber, L Idoumghar, PA Muller, et al. 2019. Deep learning for time series classification: a review. Data Min. Knowl. Discov. 33, 4 (2019), 917–963.
Brian Kenji Iwana and Seiichi Uchida. 2021. An empirical survey of data augmentation for time series classification with neural networks. Plos one 16, 7 (2021), e0254841.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012).
Terry T Um, Franz MJ Pfister, Daniel Pichler, Satoshi Endo, Muriel Lang, Sandra Hirche, Urban Fietzek, and Dana Kulić. 2017. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM international conference on multimodal interaction. 216–220.
Khandakar M Rashid and Joseph Louis. 2019. Window-warping: a time series data augmentation of IMU data for construction equipment activity identification. In ISARC. Proceedings of the international symposium on automation and robotics in construction, Vol. 36. IAARC Publications, 651–657.
Edgar Talavera, Guillermo Iglesias, Ángel González-Prieto, Alberto Mozo, and Sandra Gómez-Canaval. 2022. Data augmentation techniques in time series domain: A survey and taxonomy. arXiv preprint arXiv:2206.13508 2206.13508 (2022).
Anthony Bagnall, Jason Lines, Aaron Bostrom, James Large, and Eamonn Keogh. 2017. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data mining and knowledge discovery 31 (2017), 606–660.
John Cristian Borges Gamboa. 2017. Deep learning for time-series analysis. arXiv preprint arXiv:1701.01887 1701.01887 (2017).
Hoang Anh Dau, Anthony Bagnall, Kaveh Kamgar, Chin-Chia Michael Yeh, Yan Zhu, Shaghayegh Gharghabi, Chotirat Ann Ratanamahatan, and Eamonn Keogh. 2019. The UCR time series archive. IEEE/CAA Journal of Automatica Sinica 6, 6 (2019), 1293–1305.
Peiyu Li, Soukaïna Filali Boubrahimi, and Shah Muhammad Hamdi. 2021. Shapelets-based Data Augmentation for Time Series Classification. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 1373–1378.
Arthur Le Guennec, Simon Malinowski, and Romain Tavenard. 2016. Data augmentation for time series classification using convolutional neural networks. In ECML/PKDD workshop on advanced analytics and learning on temporal data.
Germain Forestier, François Petitjean, Hoang Anh Dau, Geoffrey I Webb, and Eamonn Keogh. 2017. Generating synthetic time series to augment sparse datasets. In 2017 IEEE international conference on data mining (ICDM). IEEE, 865–870.
Shota Haradal, Hideaki Hayashi, and Seiichi Uchida. 2018. Biosignal data augmentation based on generative adversarial networks. In 2018 40th annual international conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, 368–371.
Elizabeth Fons, Paula Dawson, Xiao-jun Zeng, John Keane, and Alexandros Iosifidis. 2020. Evaluating data augmentation for financial time series classification. arXiv preprint arXiv:2010.15111 2010.15111 (2020).
Naoki Nonaka and Jun Seita. 2020. Data augmentation for electrocardiogram classification with deep neural network. arXiv preprint arXiv:2009.04398 2009.04398 (2020).
Maxime Goubeaud, Philipp Joußen, Nicolla Gmyrek, Farzin Ghorban, and Anton Kummert. 2021. White Noise Windows: Data Augmentation for Time Series. In 2021 7th International Conference on Optimization and Applications (ICOA). IEEE, 1–5.
Qingsong Wen, Liang Sun, Fan Yang, Xiaomin Song, Jingkun Gao, Xue Wang, and Huan Xu. 2020. Time series data augmentation for deep learning: A survey. arXiv preprint arXiv:2002.12478 2002.12478 (2020).
Humza Naveed, Saeed Anwar, Munawar Hayat, Kashif Javed, and Ajmal Mian. 2021. Survey: Image mixing and deleting for data augmentation. arXiv preprint arXiv:2106.07085 2106.07085 (2021).
Steven Y Feng, Varun Gangal, Jason Wei, Sarath Chandar, Soroush Vosoughi, Teruko Mitamura, and Eduard Hovy. 2021. A survey of data augmentation approaches for NLP. arXiv preprint arXiv:2105.03075 2105.03075 (2021).
Claes Wohlin. 2014. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th international conference on evaluation and assessment in software engineering. 1–10.
Wayne C Booth, Gregory G Colomb, and Joseph M Williams. 2003. The craft of research. University of Chicago press.
John W Creswell and J David Creswell. 2017. Research design: Qualitative, quantitative, and mixed methods approaches. Sage publications.
Barbara Gastel and Robert A Day. 2022. How to write and publish a scientific paper. ABC-CLIO.
Shuang Pi, Shuanggen Zhang, Shumin Wang, Bochi Guo, and Wei Yan. 2022. Improving Modulation Recognition Using Time Series Data Augmentation via a Spatiotemporal Multi-Channel Framework. Electronics 12, 1 (2022), 96.
DongShi Zhang, Jun Liu, and ChangHao Liang. 2017. Perspective on how laser-ablated particles grow in liquids. Science China Physics, Mechanics & Astronomy 60 (2017), 1–16.
Zhuoran Cai, Wenxuan Ma, Xinrui Wang, Hanhong Wang, and Zhongming Feng. 2023. The Performance Analysis of Time Series Data Augmentation Technology for Small Sample Communication Device Recognition. IEEE Transactions on Reliability 72, 2 (2023), 574–585. https://doi.org/10.1109/TR.2022.3178707
Gyu-Il Kim and Kyungyong Chung. 2024. Augmented and End-to-End Models for Defect Classification of Structures. In Proceedings of the 18th ACM International Conference on Distributed and Event-based Systems. 183–184.
Bo Liu, Zhenguo Zhang, and Rongyi Cui. 2020. Efficient time series augmentation methods. In 2020 13th international congress on image and signal processing, BioMedical engineering and informatics (CISP-BMEI). IEEE, 1004–1009.
Kyutae Kim and Jongpil Jeong. 2020. Deep learning-based data augmentation for hydraulic condition monitoring system. Procedia Computer Science 175 (2020), 20–27.
Khandakar M Rashid and Joseph Louis. 2019. Times-series data augmentation and deep learning for construction equipment activity recognition. Advanced Engineering Informatics 42 (2019), 100944.
Liang Huang, Weijian Pan, You Zhang, Liping Qian, Nan Gao, and Yuan Wu. 2019. Data augmentation for deep learning-based radio modulation classification. IEEE access 8 (2019), 1498–1506.
Chao Li, Korkut Kaan Tokgoz, Masamoto Fukawa, Jim Bartels, Takumi Ohashi, Ken-ichi Takeda, and Hiroyuki Ito. 2021. Data augmentation for inertial sensor data in CNNs for cattle behavior classification. IEEE Sensors Letters 5, 11 (2021), 1–4.
Pouya Hosseinzadeh, Soukaina Filali Boubrahimi, and Shah Muhammad Hamdi. 2024. Improving solar energetic particle event prediction through multivariate time series data augmentation. The Astrophysical Journal Supplement Series 270, 2 (2024), 31.
Edmund Do, Jack Boynton, Byung Suk Lee, and Daniel Lustgarten. 2022. Data augmentation for 12-lead ecg beat classification. SN Computer Science 3 (2022), 1–17.
Dawid Warchoł and Mariusz Oszust. 2022. Efficient Augmentation of Human Action Recognition Datasets with Warped Windows. Procedia Computer Science 207 (2022), 3018–3027.
Navdeep Jaitly and Geoffrey E Hinton. 2013. Vocal tract length perturbation (VTLP) improves speech recognition. In Proc. ICML Workshop on Deep Learning for Audio, Speech and Language, Vol. 117. 21.
Wenbo Yang, Jidong Yuan, and Xiaokang Wang. 2023. SFCC: data augmentation with stratified Fourier coefficients combination for time series classification. Neural Processing Letters 55, 2 (2023), 1833–1846.
Maxime Goubeaud, Nicolla Gmyrek, Farzin Ghorban, Lucas Schelkes, and Anton Kummert. 2021. Random Noise Boxes: Data Augmentation for Spectrograms. In 2021 IEEE International Conference on Progress in Informatics and Computing (PIC). IEEE, 24–28.
Odongo Steven Eyobu and Dong Seog Han. 2018. Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network. Sensors 18, 9 (2018), 2892.
Tracey Eileen KM Lee, YL Kuah, Kee-Hao Leo, Saeid Sanei, Effie Chew, and Ling Zhao. 2019. Surrogate rehabilitative time series data for image-based deep learning. In 2019 27th European Signal Processing Conference (EUSIPCO). IEEE, 1–5.
Rodrigo H Zanella, Lucas A de Castro Coelho, and Vinicius MA Souza. 2022. TS-DENSE: Time Series Data Augmentation by Subclass Clustering. In 2022 26th International Conference on Pattern Recognition (ICPR). IEEE, 1800–1806.
Amine Mohamed Aboussalah, Minjae Kwon, Raj G Patel, Cheng Chi, and Chi-Guhn Lee. 2022. Recursive Time Series Data Augmentation. In The Eleventh International Conference on Learning Representations.
Krzysztof Kamycki, Tomasz Kapuscinski, and Mariusz Oszust. 2019. Data augmentation with suboptimal warping for time-series classification. Sensors 20, 1 (2019), 98.
Dawid Warchoł and Mariusz Oszust. 2022. Augmentation of Human Action Datasets with Suboptimal Warping and Representative Data Samples. Sensors 22, 8 (2022), 2947.
Hans Friedrich Stabenau, Christopher P Bridge, and Jonathan W Waks. 2021. ECGAug: A novel method of generating augmented annotated electrocardiogram QRST complexes and rhythm strips. Computers in biology and medicine 134 (2021), 104408.
Brian Kenji Iwana and Seiichi Uchida. 2021. Time series data augmentation for neural networks by time warping with a discriminative teacher. In 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 3558–3565.
Cheolhwan Oh, Seungmin Han, and Jongpil Jeong. 2020. Time-series data augmentation based on interpolation. Procedia Computer Science 175 (2020), 64–71.
Xinyu Yang, Zhenguo Zhang, Xu Cui, and Rongyi Cui. 2021. A time series data augmentation method based on dynamic time warping. In 2021 International Conference on Computer Communication and Artificial Intelligence (CCAI). IEEE, 116–120.
Zhifeng Li, Yaqin Song, Runchen Li, Sen Gu, and Xuze Fan. 2022. A Novel Data Augmentation Method for Improving the Accuracy of Insulator Health Diagnosis. Sensors 22, 21 (2022), 8187.
Pin Liu, Xiaohui Guo, Pengpeng Chen, Bin Shi, Tianyu Wo, and Xudong Liu. 2022. Adaptive Shapelets Preservation for Time Series Augmentation. In 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 1–8.
Ebrahim Khalili and Babak Mohammadzadeh Asl. 2021. Automatic sleep stage classification using temporal convolutional neural network and new data augmentation technique from raw single-channel EEG. Computer Methods and Programs in Biomedicine 204 (2021), 106063.
Yousef El-Laham and Svitlana Vyetrenko. 2022. StyleTime: Style Transfer for Synthetic Time Series Generation. In Proceedings of the Third ACM International Conference on AI in Finance. 489–496.
Mohammad Akyash, Hoda Mohammadzade, and Hamid Behroozi. 2021. DTW-Merge: A Novel Data Augmentation Technique for Time Series Classification. arXiv:2103.01119 [cs.LG].
Nanni Loris and Maguolo Gianluca. 2020. Paci Michelangelo. Data augmentation approaches for improving animal audio classification. Ecological Informatics 57, 101084 (2020), 10–1016.
Xiaodong Cui, Vaibhava Goel, and Brian Kingsbury. 2015. Data augmentation for deep neural network acoustic modeling. IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, 9 (2015), 1469–1477.
Xuyang Zhao, Jordi Solé-Casals, Binghua Li, Zihao Huang, Andong Wang, Jianting Cao, Toshihisa Tanaka, and Qibin Zhao. 2020. Classification of epileptic IEEG signals by CNN and data augmentation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 926–930.
Loris Nanni, Gianluca Maguolo, and Michelangelo Paci. 2020. Data augmentation approaches for improving animal audio classification. Ecological Informatics 57 (2020), 101084.
Downloads
Published
How to Cite
Issue
Section
License
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.