A Comprehensive Evaluation Of Shekar: An Open-Source Python Framework For State-Of-The-Art Persian Natural Language Processing And Computational Linguistics
Abstract
Purpose: This study introduces and comprehensively evaluates Shekar, an open-source Python toolkit engineered to address the persistent challenges in processing the morphologically rich and low-resource Persian language. The framework is specifically designed to bridge the gap between complex linguistic phenomena and the computational demands of state-of-the-art deep learning architectures.
Methods: Shekar's architecture emphasizes a modular and performance-optimized pipeline, featuring advanced Unicode normalization, novel subword tokenization strategies adapted from SentencePiece, and seamless integration layers for Transformer-based models such as ParsBERT and ALBERT. Empirical evaluation involved intrinsic analysis (tokenization throughput, POS-tagging accuracy on the Universal Dependencies Persian Treebank) and an extrinsic task (hate speech detection using the Naseza dataset) against established baseline toolkits.
Results: Shekar demonstrated significant performance enhancements across all evaluated metrics. The customized subword tokenization approach, essential for handling Persian’s expansive vocabulary and morphological richness, yielded an increase in tokenization throughput by $\sim 18\%$ compared to existing tools. Furthermore, when employed for data pre-processing in the extrinsic hate speech detection task, Shekar-processed input led to an average F1-score improvement of $4.1$ percentage points over conventional pre-processing chains, affirming the superior quality of the foundational linguistic analysis.
Conclusion: Shekar represents a crucial advancement for Persian computational linguistics, providing researchers and practitioners with an extensible, high-performance platform capable of fully leveraging modern deep learning models and large-scale corpora. Its design directly mitigates key challenges, positioning it as the recommended foundation for future Persian NLP research.
Keywords
References
Similar Articles
- Dr. Kenji Yamamoto, Prof. Lijuan Wang, LEVERAGING DEEP LEARNING IN SURVIVAL ANALYSIS FOR ENHANCED TIME-TO-EVENT PREDICTION , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 05 (2025): Volume 02 Issue 05
- Adrian Velasco, Meera Narayan, REVOLUTIONIZING SILICON PHOTONIC DEVICE DESIGN THROUGH DEEP GENERATIVE MODELS: AN INVERSE APPROACH AND EMERGING TRENDS , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 06 (2025): Volume 02 Issue 06
- Dr. Aris Thorne, Generating Dual-Identity Face Impersonations with Generative Adversarial Networks: An Adversarial Attack Methodology , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Dr. Lucas M. Hoffmann, Dr. Aya El-Masry, ALIGNING EXPLAINABLE AI WITH USER NEEDS: A PROPOSAL FOR A PREFERENCE-AWARE EXPLANATION FUNCTION , International Journal of Advanced Artificial Intelligence Research: Vol. 1 No. 01 (2024): Volume 01 Issue 01
- Dr. Alejandro Moreno, An Explainable, Context-Aware Zero-Trust Identity Architecture for Continuous Authentication in Hybrid Device Ecosystems , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 11 (2025): Volume 02 Issue 11
- Dr. Elena M. Ruiz, Integrating Big Data Architectures and AI-Powered Analytics into Mergers & Acquisitions Due Diligence: A Theoretical Framework for Value Measurement, Risk Detection, and Strategic Decision-Making , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 09 (2025): Volume 02 Issue 09
- Michael Andersson, Optimizing Continuous Schema Evolution and Zero-Downtime Microservices in Enterprise Data Architectures , International Journal of Advanced Artificial Intelligence Research: Vol. 3 No. 01 (2026): Volume 03 Issue 01
- Michael Andrew Thornton, Designing and Evaluating Low Latency Web APIs for High Transaction and Industrial Internet Systems: Architectural, Methodological, and Socio Technical Perspectives , International Journal of Advanced Artificial Intelligence Research: Vol. 3 No. 01 (2026): Volume 03 Issue 01
- Nourhan F. Abdelrahman, Miguel Torres, CRAFTING DUAL-IDENTITY FACE IMPERSONATIONS USING GENERATIVE ADVERSARIAL NETWORKS: AN ADVERSARIAL ATTACK METHODOLOGY , International Journal of Advanced Artificial Intelligence Research: Vol. 1 No. 01 (2024): Volume 01 Issue 01
- Dr. Ayesha Siddiqui, ENHANCED IDENTIFICATION OF EQUATORIAL PLASMA BUBBLES IN AIRGLOW IMAGERY VIA 2D PRINCIPAL COMPONENT ANALYSIS AND INTERPRETABLE AI , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 02 (2025): Volume 02 Issue 02
You may also start an advanced similarity search for this article.