Open Access

Scalable Machine Learning Approach in R for Structural Classification and Behavioral Analysis of Massive Twitter Network Data

4 Faculty of Management University of Oxford, Oxford, UK
4 Department of Information Technology University of Cambridge, Cambridge, UK

Abstract

The exponential growth of social media platforms, particularly Twitter, has introduced unprecedented challenges in analyzing large-scale, high-velocity, and high-dimensional network data. Traditional analytical frameworks often struggle to efficiently process structural and behavioral patterns embedded within massive Twitter datasets due to computational limitations and scalability constraints. This study proposes a scalable machine learning approach implemented in R for structural classification and behavioral analysis of large Twitter network data. The framework integrates distributed data processing concepts, dimensionality reduction techniques, and supervised learning models to enable efficient extraction of latent social structures and user behavioral patterns. Leveraging the R-based machine learning ecosystem, particularly the mlr package (Bischl et al., 2017), the proposed system supports modular algorithm selection, automated model tuning, and scalable classification workflows.

The methodology incorporates preprocessing of Twitter graph data, feature engineering using network metrics, and classification using algorithms such as Support Vector Machines and Random Forests. Dimensionality reduction techniques inspired by large-scale data analytics principles (Ali et al., 2017) are applied to improve computational efficiency. The study further evaluates the role of big data architectures in enhancing scalability and performance (Gandomi and Haider, 2015). Experimental simulation demonstrates that the proposed framework improves classification accuracy while maintaining computational feasibility for large datasets.

The findings highlight that R-based machine learning pipelines can effectively handle structural classification tasks when integrated with scalable design principles and optimized feature representations. This research contributes to the growing field of social big data analytics by offering a flexible and extensible framework for Twitter network analysis.

Keywords

References

S. Ahmed, M. U. Ali, J. Ferzund, M. A. Sarwar, A. Rehman and A. Mehmood, "Modern Data Formats for Big Bioinformatics Data Analytics," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 8, no. 4, 2017.
M. U. Ali, S. Ahmad and J. Ferzund, "Harnessing the Potential of Machine Learning for Bioinformatics using Big Data Tools," International Journal of Computer Science and Information Security (IJCSIS), vol. 14, no. 10, pp. 668-675, 2016.
M. U. Ali, S. Ahmed, J. Ferzund, A. Mehmood and A. Rehman, "Using PCA and Factor Analysis for Dimensionality Reduction of Bio-informatics Data," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 8, no. 5, pp. 415--426, 2017.
A. Rehman, A. Abbas, M. A. Sarwar and J. Ferzund, "Need and Role of Scala Implementations in Bioinformatics," International Journal of Advanced Computer Science and Applications (IJACSA), vol. 08, no. 02, 2017.
B. Bischl, M. Lang, L. Kotthoff, J. Schiffner, J. Richter, E. Studerus, G. Casalicchio and Z. M. Jones, "mlr: Machine Learning in R," Journal of Machine Learning Research, vol. 17, pp. 1-5, 2017.
G. Bello-Orgaz, J. Jung and D. Camacho, "Social big data: Recent achievements and new challenges," Information Fusion, vol. 28, pp. 45–59, 2016.
Y. Duan, J. S. Edwards and Y. K. Dwivedi, "Artificial intelligence for decision making in the era of Big Data–Evolution, challenges and research agenda," International Journal of Information Management, vol. 48, pp. 63–71, 2020.
A. Gandomi and M. Haider, "Beyond the hype: Big data concepts, methods, and analytics," International Journal of Information Management, vol. 35, no. 2, pp. 137–144, 2015.
M. S. Hadi, A. Q. Lawey, T. E. El-Gorashi and J. M. Elmirghani, "Big data analytics for wireless and wired network design: A survey," Computer Networks, vol. 132, pp. 180–199, 2018.
T. P. Jurka, L. Collingwood, A. E. Boydstun, E. Grossman and W. van Atteveldt, "RTextTools: A Supervised Learning Package for Text Classification," The R Journal, vol. 5, no. 1, pp. 6-12, 2013.
Ranganath Kanakam et al., "A survey on approaches and issues for detecting sarcasm on social media tweets," AIP Conference Proceedings, vol. 2418, no. 1, AIP Publishing LLC, 2022.
A. Karatzoglou, D. Meyer and K. Hornik, "Support Vector Machines in R," Journal of Statistical Software, vol. 15, no. 9, 2006.
R. Kitchin, "The real-time city? Big data and smart urbanism," GeoJournal, vol. 79, no. 1, pp. 1–14, 2014.
L. Mitchell, "A parallel random forest implementation for R," Technical report, EPCC, 2011.
H. Qian, "PivotalR: A Package for Machine Learning on Big Data," The R Journal, vol. 6, no. 1, pp. 57–67, 2014.
T. R. Prajwala, "A Comparative Study on Decision Tree and Random Forest Using R Tool," International Journal of Advanced Research in Computer and Communication Engineering, vol. 4, no. 1, pp. 196-199, 2015.
F.-Q. Pei, D.-B. Li and Y.-F. Tong, "Double-layered big data analytics architecture for solar cells series welding machine," Computers in Industry, vol. 97, pp. 17–23, 2018.
S. Peng, G. Wang, Y. Zhou, C. Wan, C. Wang and S. Yu, "An immunization framework for social networks through big data based influence modeling," IEEE Transactions on Dependable and Secure Computing, 2017.
T. Roshini et al., "Social media survey using decision tree and Naive Bayes classification," 2nd International Conference on Intelligent Communication and Computational Techniques (ICCT), IEEE, 2021. JICET, 2023, Vol:3, No:1
M. A. Sarwar, A. Rehman and J. Ferzund, "Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data for Bioinformatics," International Journal of Computer Science and Information Security (IJCSIS), vol. 14, no. 12, pp. 317-328, 2016.
S. Sagiroglu and D. Sinanc, "Big data: A review," International Conference on Collaboration Technologies and Systems (CTS), IEEE, 2013, pp. 42–47.

Similar Articles

1-10 of 67

You may also start an advanced similarity search for this article.