EVALUATING CONVERSATIONAL AND PLATFORM-INTEGRATED GENERATIVE AI FOR AUTOMATED, TIMELY FEEDBACK IN PROGRAMMING EDUCATION: A QUASI-EXPERIMENTAL STUDY UTILIZING GPT-4O-MINI
DOI:
https://doi.org/10.55640/Keywords:
Programming Education, Generative AI, GPT-4o-mini, Automated FeedbackAbstract
Context: Effective feedback is critical for novice programmers, but providing it in a timely and scalable manner poses a significant challenge in higher education [13], [14], [37]. Generative Artificial Intelligence (GenAI), particularly Large Language Models (LLMs) trained on code [9], [36], offers a promising avenue to automate this process [1], [22].
Objectives: This quasi-experimental study aimed to evaluate the usability, student perceptions, and academic impact of two distinct GenAI-assisted feedback tools, both powered by GPT-4o-mini: a conversational assistant (tutorB@t) and a platform-embedded tool integrated with a virtual code evaluator (tutorBot+).
Methods: The study involved 91 undergraduate computer science students, with 37 assigned to the experimental AI-assisted group. We measured student programming performance, passing rates, and user perception using the System Usability Scale (SUS) [6] to assess the perceived utility and ease of use of the developed tools.
Results: Students highly valued the immediacy and accessibility of the AI feedback. Perception scores were positive, with tutorB@t achieving a SUS score of 70.6 and tutorBot+ scoring 65.2, and a high intent to reuse (81% and 79%, respectively). Crucially, despite positive perceptions, the study found no statistically significant difference in objective programming performance or passing rates between the groups. This outcome is attributed primarily to factors such as a lack of group homogeneity, external academic pressures, and occasional student misunderstanding of the GenAI-provided feedback.
Conclusion: Timely, automated feedback from GenAI is highly valued by students for its accessibility. Yet, the current study suggests that design limitations (usability, student misunderstandings, external factors) may mask the direct academic impact, highlighting a need for refined integration and future research incorporating affective measures [15], [38] to fully understand and unlock the pedagogical potential of LLM-based feedback [33].
References
Azaiz, I., Kiesler, N., & Strickroth, S. (2024). Feedback-generation for programming exercises with GPT4. In: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. ITiCSE 2024. ACM (pp. 31–37). https://doi.org/10.1145/3649217.3653594
Bailey, R., & Garner, M. (2010). Is the retroalimentación in higher education assessment worth the paper it is written on? Teachers’ reflections on their practices. Teaching in Higher Education, 15(2), 187–198. https://doi.org/10.1080/13562511003620019
Bangs, J. (2007). Teaching perfect and imperfect competition with context-rich problems. SSRN Electronic Journal, 92(3), 463. https://doi.org/10.2139/ssrn.1024000
Bassner, P., Frankford, E., & Krusche, S. (2024). Iris: an AI-driven virtual tutor for computer science education. In: Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1. Milan, Italy: Association for Computing Machinery (pp. 394–400). https://doi.org/10.1145/3649217.3653543
Billis, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., & Saunders, W. (2023). Language models can explain neurons in language models. Available at https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html.
Brooke, J. (1986). SUS—a quick and dirty usability scale. In: Usability Evaluation in Industry. United Kingdom: Taylor & Francis (pp. 189–194).
Bull, C., & Kharrufa, A. (2024). Generative artificial intelligence assistants in software development education: a vision for integrating generative artificial intelligence into educational practice, not instinctively defending against it. IEEE Sof1tware, 41(2), 52–59). https://doi.org/10.1109/ms.2023.3300574
Cardoso-Júnior, A., & Faria, R. M. D. D. (2021). Psychometric assessment of the Instructional Materials Motivation Survey (IMMS) instrument in a remote learning environment. Revista Brasileira de Educação Médica, 45(4), e197. https://doi.org/10.1590/1981-5271v45.4-20210066.ing
Chen, M., Tworek, J., Jun, H., Yuan, Q., de Oliveira Pinto, H. P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F. P., Cummings, D., Plappert, M., Chantzis, F., Barnes, E., Herbert-Voss, A., Guss, W. H., Nichol, A., Paino, A., Tezak, N., Tang, J., Babuschkin, I., Balaji, S., Jain, S., Saunders, W., Hesse, C., Carr, A. N., Leike, J., Achiam, J., Misra, V., Morikawa, E., Radford, A., Knight, M., Brundage, M., Murati, M., Mayer, K., Welinder, P., McGrew, B., Amodei2
Kumar Tiwari, S. (2023). Integration of AI and machine learning with automation testing in digital transformation. International Journal of Applied Engineering & Technology, 5(S1), 95–103. Roman Science Publications.
Kesarpu, S., & Hari Prasad Dasari. (2025). Kafka Event Sourcing for Real-Time Risk Analysis. International Journal of Computational and Experimental Science and Engineering, 11(3). https://doi.org/10.22399/ijcesen.3715
Singh, V. (2024). The impact of artificial intelligence on compliance and regulatory reporting. J. Electrical Systems, 20(11s), 4322–4328. https://doi.org/10.52783/jes.8484
Real-Time Financial Data Processing Using Apache Spark and Kafka. (2025). International Journal of Data Science and Machine Learning, 5(01), 137-169. https://doi.org/10.55640/ijdsml-05-01-16
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Prof. Kenji A. Takada (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.