A Comprehensive Evaluation Of Shekar: An Open-Source Python Framework For State-Of-The-Art Persian Natural Language Processing And Computational Linguistics
Abstract
Purpose: This study introduces and comprehensively evaluates Shekar, an open-source Python toolkit engineered to address the persistent challenges in processing the morphologically rich and low-resource Persian language. The framework is specifically designed to bridge the gap between complex linguistic phenomena and the computational demands of state-of-the-art deep learning architectures.
Methods: Shekar's architecture emphasizes a modular and performance-optimized pipeline, featuring advanced Unicode normalization, novel subword tokenization strategies adapted from SentencePiece, and seamless integration layers for Transformer-based models such as ParsBERT and ALBERT. Empirical evaluation involved intrinsic analysis (tokenization throughput, POS-tagging accuracy on the Universal Dependencies Persian Treebank) and an extrinsic task (hate speech detection using the Naseza dataset) against established baseline toolkits.
Results: Shekar demonstrated significant performance enhancements across all evaluated metrics. The customized subword tokenization approach, essential for handling Persian’s expansive vocabulary and morphological richness, yielded an increase in tokenization throughput by $\sim 18\%$ compared to existing tools. Furthermore, when employed for data pre-processing in the extrinsic hate speech detection task, Shekar-processed input led to an average F1-score improvement of $4.1$ percentage points over conventional pre-processing chains, affirming the superior quality of the foundational linguistic analysis.
Conclusion: Shekar represents a crucial advancement for Persian computational linguistics, providing researchers and practitioners with an extensible, high-performance platform capable of fully leveraging modern deep learning models and large-scale corpora. Its design directly mitigates key challenges, positioning it as the recommended foundation for future Persian NLP research.
Keywords
References
Similar Articles
- Dr. Lukas Reinhardt, Next-Generation Security Operations Centers: A Holistic Framework Integrating Artificial Intelligence, Federated Learning, and Sustainable Green Infrastructure for Proactive Threat Mitigation , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 09 (2025): Volume 02 Issue 09
- Dr. Leila K. Moreno, Integrated Real-Time Fraud Detection and Response: A Streaming Analytics Framework for Financial Transaction Security , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 11 (2025): Volume 02 Issue 11
- Dr. Jonathan K. Pierce, Modern Data Lakehouse Architectures: Integrating Cloud Warehousing, Analytics, and Scalable Data Management , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 12 (2025): Volume 02 Issue 12
- Dr. Anya Sharma, Leveraging Geospatial Context and Population Attributes for Hyper-Personalized E-Commerce Recommendations , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 09 (2025): Volume 02 Issue 09
- Farhad Nouri, Dr. Mohammadreza Nouri, ADAPTIVE SIMILARITY-DRIVEN APPROACHES FOR CONTINUAL LEARNING: BRIDGING TASK-AWARE AND TASK-FREE PARADIGMS , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 01 (2025): Volume 02 Issue 01
- Dr. Kenji Yamamoto, Prof. Lijuan Wang, LEVERAGING DEEP LEARNING IN SURVIVAL ANALYSIS FOR ENHANCED TIME-TO-EVENT PREDICTION , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 05 (2025): Volume 02 Issue 05
- Adrian Velasco, Meera Narayan, REVOLUTIONIZING SILICON PHOTONIC DEVICE DESIGN THROUGH DEEP GENERATIVE MODELS: AN INVERSE APPROACH AND EMERGING TRENDS , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 06 (2025): Volume 02 Issue 06
- Dr. Aris Thorne, Generating Dual-Identity Face Impersonations with Generative Adversarial Networks: An Adversarial Attack Methodology , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Dr. Lucas M. Hoffmann, Dr. Aya El-Masry, ALIGNING EXPLAINABLE AI WITH USER NEEDS: A PROPOSAL FOR A PREFERENCE-AWARE EXPLANATION FUNCTION , International Journal of Advanced Artificial Intelligence Research: Vol. 1 No. 01 (2024): Volume 01 Issue 01
- Dr. Alejandro Moreno, An Explainable, Context-Aware Zero-Trust Identity Architecture for Continuous Authentication in Hybrid Device Ecosystems , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 11 (2025): Volume 02 Issue 11
You may also start an advanced similarity search for this article.