HYBRID DEEP LEARNING FOR TEXT CLASSIFICATION: INTEGRATING BIDIRECTIONAL GATED RECURRENT UNITS WITH CONVOLUTIONAL NEURAL NETWORKS
DOI:
https://doi.org/10.55640/ijidml-v02i04-02Keywords:
Text classification, Hybrid deep learning, Bidirectional gated recurrent units, Convolutional neural networksAbstract
Text classification remains a foundational task in natural language processing with wide-ranging applications, including sentiment analysis, topic categorization, spam detection, and information retrieval. While convolutional neural networks (CNNs) are adept at capturing local n-gram features, and recurrent neural networks (RNNs) excel at modeling sequential dependencies, standalone architectures often struggle to fully leverage both aspects simultaneously. This study presents a hybrid deep learning model that integrates bidirectional gated recurrent units (Bi-GRU) with convolutional neural networks to enhance text classification performance. The proposed architecture first employs Bi-GRU layers to capture long-range contextual relationships in both forward and backward directions, followed by convolutional and pooling layers that extract local patterns and higher-order semantic features. The fusion of sequential and spatial representations allows the model to develop rich feature hierarchies that improve discriminative power. Extensive experiments conducted on benchmark datasets, including IMDB, AG News, and Yelp Reviews, demonstrate that the hybrid Bi-GRU–CNN model consistently outperforms traditional RNNs, CNNs, and other baseline methods in terms of accuracy, precision, recall, and F1-score. This research highlights the efficacy of combining recurrent and convolutional architectures for text classification and provides a robust framework adaptable to various real-world NLP applications.
References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., . . . Zheng, X. (2016).44 TensorFlow: Large-scale machine learning on heterogeneous distributed systems.45 http://tensorflow.org
Brownlee, J. (2017). How to one hot encode sequence data in python. Machine Learning Mastery, 12. https://machinelearningmastery.com/how-to-one-hot-encode-sequence-data-in-python/
Brownlee, J. (2019). A gentle introduction to batch normalization for deep neural networks. Machine Learning Master. https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
Cardoso-Cachopo, A. (2007). Improving methods for single-label text categorization [Unpublished doctoral dissertation]. https://ana.cachopo.org/datasets-for-single-label-text-categorization
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., & Shelhamer, E. (2014). CUDNN: Efficient primitives for deep learning.46 CoRR, abs/1410.0759.
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(12), 2493–2537.
Dai, A. M., & Le, Q. V. (2015). Semi-supervised sequence learning. Advances in Neural Information Processing Systems, 28, 3079–3087.
Fawcett, T. (2006). Introduction to receiver operator curves. Pattern Recognition Letters, 27(8), 861–874. doi:10.1016/j.patrec.2005.10.010
Guo, L., Zhang, D., Wang, L., Wang, H., & Cui, B. (2018, October). CRAN: A hybrid CNN-RNN attentionbased model for text classification.47 In Proceedings of the International Conference on Conceptual Modeling (vol. 11157, pp. 571-585). Springer. doi:10.1007/978-3-030-00847-5_42
Hochreiter, S., & Schmidhuber, J. (1997).48 Long short-term memory. Neural Computation, 9(8), 1735-1780. 10.1162/neco.1997.9.8.1735
Johnson, R., & Zhang, T. (2015a). Effective use of word order for text categorization with convolutional neural networks. 10.3115/v1/N15-1011
Johnson, R., & Zhang, T. (2015b). Semi-supervised convolutional neural networks for text categorization via region embedding. Advances in Neural Information Processing Systems, 28, 919–927. PMID:27087766
Keras. (2021a). Keras documentation: GRU layer. https://keras.io/api/layers/recurrent_layers/gru/
Keras. (2021b). Keras documentation: Batch Normalization layer. https://keras.io/api/layers/normalization_layers/batch_normalization/
Kim, Y. (2014) Convolutional Neural Networks for Sentence Classification.49 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1746-1751. doi:10.3115/v1/D14-1181
Koidl, K. (2013). Loss functions in classification tasks. School of Computer Science and Statistic Trinity College, Dublin. https://www.scss.tcd.ie/~koidlk/cs4062/Loss-Functions.pdf
Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P,S., & He, L. (2020). A survey on text classification: From shallow to deep learning. arXiv 2020, arXiv:2008.00364.
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. https://arxiv.org/pdf/1605.05101
Liu, Y., Li, P., & Hu, X. (2022). Combining context-relevant features with multi-stage attention network for short text classification. Computer Speech & Language, 71(C), 101268. doi:10.1016/j.csl.2021.101268
Malekzadeh, M., Hajibabaee, P., Heidari, M., Zad, S., Uzuner, O., & Jones, J. H. (2021). Review of graph neural network in text classification. In Proceedings of the 2021 IEEE 12th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0084-0091). IEEE. doi:10.1109/UEMCON53757.2021.9666633
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.
NVIDIA Developer. (2022). Recurrent neural network. https://developer.nvidia.com/discover/recurrent-neural-network
Padawe, G. (2019, October 27). Word2Vector using Gensim. https://medium.com/analytics-vidhya/word2vector-using-gensim-e055d35f1cb4
Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation.50 Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 14, 1532-1543. doi:10.3115/v1/D14-1162
Řehůřek, R. (2022, May 6). Gensim: Topic modelling for humans. https://radimrehurek.com/gensim/
Ren, J., Wu, W., Liu, G., Chen, Z., & Wang, R. (2021). Bidirectional gated temporal convolution with attention for text classification. Neurocomputing, 455(C), 265-273. 10.1016/j.neucom.2021.05.072
Saxena, S. (2020, October 3). Understanding embedding layer in Keras. https://medium.com/analytics-vidhya/understanding-embedding-layer-in-keras-bbe3ff1327ce
Silwimba, F. (2018, October 17). Bidirectional GRU for Text classification by relevance to SDG#3 indicators. https://medium.com/@felixs_76053/bidirectional-gru-for-text-classification-by-relevance-to-sdg-3-indicators-2e5fd99cc341
Song, R., Giunchiglia, F., Zhao, K., Tian, M., & Xu, H. (2022). Graph topology enhancement for text classification. Applied Intelligence, 1-14. 10.1007/s10489-021-03113-8
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016, June). Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1480-1489). Association for Computational Linguistics.
Yao, L., Mao, C., & Luo, Y. (2019).51 Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence (vol. 33, pp. 7370-7377). Open Journal Systems. doi:10.1609/aaai.v33i01.33017370
Zhang, B., Wu, J. L., & Chang, P. C. (2018). A multiple time series-based recurrent neural network for short-term load forecasting. Soft Computing, 22(12), 4099–4112. doi:10.1007/s00500-017-2624-5
Zhang, J., Li, Y., Tian, J., & Li, T. (2018). LSTM-CNN hybrid model for text classification. In Proceedings of the 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) (pp. 1675-1680). IEEE. doi:10.1109/IAEAC.2018.8577620
Zhang, X., Zhao, J., & LeCun, Y. (2015). Character-level convolutional networks for text classification. 10.48550/arXiv.1509.01626
Zulqarnain, M., Ghazali, R., Ghouse, M. G., & Mushtaq, M. F. (2019). Efficient processing of GRU based on word embedding for text classification. International Journal on Informatics Visualization, 3(4), 377–383. doi:10.30630/joiv.3.4.289
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Yuki Nakamura, Isabella Romano (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.