HYBRID DEEP LEARNING FOR TEXT CLASSIFICATION: INTEGRATING BIDIRECTIONAL GATED RECURRENT UNITS WITH CONVOLUTIONAL NEURAL NETWORKS

Yuki Nakamura; Isabella Romano

doi:10.55640/ijidml-v02i04-02

Authors

Yuki Nakamura Graduate School Of Information Science And Technology, University Of Tokyo, Japan
Isabella Romano Department Of Computer Engineering, Politecnico Di Milano, Italy

DOI:

https://doi.org/10.55640/ijidml-v02i04-02

Keywords:

Text classification, Hybrid deep learning, Bidirectional gated recurrent units, Convolutional neural networks

Abstract

Text classification remains a foundational task in natural language processing with wide-ranging applications, including sentiment analysis, topic categorization, spam detection, and information retrieval. While convolutional neural networks (CNNs) are adept at capturing local n-gram features, and recurrent neural networks (RNNs) excel at modeling sequential dependencies, standalone architectures often struggle to fully leverage both aspects simultaneously. This study presents a hybrid deep learning model that integrates bidirectional gated recurrent units (Bi-GRU) with convolutional neural networks to enhance text classification performance. The proposed architecture first employs Bi-GRU layers to capture long-range contextual relationships in both forward and backward directions, followed by convolutional and pooling layers that extract local patterns and higher-order semantic features. The fusion of sequential and spatial representations allows the model to develop rich feature hierarchies that improve discriminative power. Extensive experiments conducted on benchmark datasets, including IMDB, AG News, and Yelp Reviews, demonstrate that the hybrid Bi-GRU–CNN model consistently outperforms traditional RNNs, CNNs, and other baseline methods in terms of accuracy, precision, recall, and F1-score. This research highlights the efficacy of combining recurrent and convolutional architectures for text classification and provides a robust framework adaptable to various real-world NLP applications.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., . . . Zheng, X. (2016).44 TensorFlow: Large-scale machine learning on heterogeneous distributed systems.45 http://tensorflow.org

Brownlee, J. (2017). How to one hot encode sequence data in python. Machine Learning Mastery, 12. https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/

Brownlee, J. (2019). A gentle introduction to batch normalization for deep neural networks. Machine Learning Master. https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/

Cardoso-Cachopo, A. (2007). Improving methods for single-label text categorization [Unpublished doctoral dissertation]. https://ana.cachopo.org/datasets-for-single-label-text-categorization

Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). CUDNN: Efficient primitives for deep learning.46 CoRR, abs/1410.0759.

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(12), 2493–2537.

Dai, A. M., & Le, Q. V. (2015). Semi-supervised sequence learning. Advances in Neural Information Processing Systems, 28, 3079–3087.

Fawcett, T. (2006). Introduction to receiver operator curves. Pattern Recognition Letters, 27(8), 861–874. doi:10.1016/j.patrec.2005.10.010

Guo, L., Zhang, D., Wang, L., Wang, H., & Cui, B. (2018, October). CRAN: A hybrid CNN-RNN attentionbased model for text classification.47 In Proceedings of the International Conference on Conceptual Modeling (vol. 11157, pp. 571-585). Springer. doi:10.1007/978-3-030-00847-5_42

Hochreiter, S., & Schmidhuber, J. (1997).48 Long short-term memory. Neural Computation, 9(8), 1735-1780. 10.1162/neco.1997.9.8.1735

Johnson, R., & Zhang, T. (2015a). Effective use of word order for text categorization with convolutional neural networks. 10.3115/v1/N15-1011

Johnson, R., & Zhang, T. (2015b). Semi-supervised convolutional neural networks for text categorization via region embedding. Advances in Neural Information Processing Systems, 28, 919–927. PMID:27087766

Keras. (2021a). Keras documentation: GRU layer. https://keras.io/api/layers/recurrent_layers/gru/

Keras. (2021b). Keras documentation: Batch Normalization layer. https://keras.io/api/layers/normalization_layers/batch_normalization/

Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification.49 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1746-1751. doi:10.3115/v1/D14-1181

Koidl, K. (2013). Loss functions in classification tasks. School of Computer Science and Statistic Trinity College, Dublin. https://www.scss.tcd.ie/~koidlk/cs4062/Loss-Functions.pdf

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P,S., & He, L. (2020). A survey on text classification: From shallow to deep learning. arXiv 2020, arXiv:2008.00364.

Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. https://arxiv.org/pdf/1605.05101

Liu, Y., Li, P., & Hu, X. (2022). Combining context-relevant features with multi-stage attention network for short text classification. Computer Speech & Language, 71(C), 101268. doi:10.1016/j.csl.2021.101268

Malekzadeh, M., Hajibabaee, P., Heidari, M., Zad, S., Uzuner, O., & Jones, J. H. (2021). Review of graph neural network in text classification. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0084-0091). IEEE. doi:10.1109/UEMCON53757.2021.9666633

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.

NVIDIA Developer. (2022). Recurrent neural network. https://developer.nvidia.com/discover/recurrent-neural-network

Padawe, G. (2019, October 27). Word2Vector using Gensim. https://medium.com/analytics-vidhya/word2vector-using-gensim-e055d35f1cb4

Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation.50 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 14, 1532-1543. doi:10.3115/v1/D14-1162

Řehůřek, R. (2022, May 6). Gensim: Topic modelling for humans. https://radimrehurek.com/gensim/

Ren, J., Wu, W., Liu, G., Chen, Z., & Wang, R. (2021). Bidirectional gated temporal convolution with attention for text classification. Neurocomputing, 455(C), 265-273. 10.1016/j.neucom.2021.05.072

Saxena, S. (2020, October 3). Understanding embedding layer in Keras. https://medium.com/analytics-vidhya/understanding-embedding-layer-in-keras-bbe3ff1327ce

Silwimba, F. (2018, October 17). Bidirectional GRU for Text classification by relevance to SDG#3 indicators. https://medium.com/@felixs_76053/bidirectional-gru-for-text-classification-by-relevance-to-sdg-3-indicators-2e5fd99cc341

Song, R., Giunchiglia, F., Zhao, K., Tian, M., & Xu, H. (2022). Graph topology enhancement for text classification. Applied Intelligence, 1-14. 10.1007/s10489-021-03113-8

Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016, June). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480-1489). Association for Computational Linguistics.

Yao, L., Mao, C., & Luo, Y. (2019).51 Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence (vol. 33, pp. 7370-7377). Open Journal Systems. doi:10.1609/aaai.v33i01.33017370

Zhang, B., Wu, J. L., & Chang, P. C. (2018). A multiple time series-based recurrent neural network for short-term load forecasting. Soft Computing, 22(12), 4099–4112. doi:10.1007/s00500-017-2624-5

Zhang, J., Li, Y., Tian, J., & Li, T. (2018). LSTM-CNN hybrid model for text classification. In Proceedings of the 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 1675-1680). IEEE. doi:10.1109/IAEAC.2018.8577620

Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. 10.48550/arXiv.1509.01626

Zulqarnain, M., Ghazali, R., Ghouse, M. G., & Mushtaq, M. F. (2019). Efficient processing of GRU based on word embedding for text classification. International Journal on Informatics Visualization, 3(4), 377–383. doi:10.30630/joiv.3.4.289

International Journal of Intelligent Data and Machine Learning

Article Details Page