A Comprehensive Evaluation Of Shekar: An Open-Source Python Framework For State-Of-The-Art Persian Natural Language Processing And Computational Linguistics
Abstract
Purpose: This study introduces and comprehensively evaluates Shekar, an open-source Python toolkit engineered to address the persistent challenges in processing the morphologically rich and low-resource Persian language. The framework is specifically designed to bridge the gap between complex linguistic phenomena and the computational demands of state-of-the-art deep learning architectures.
Methods: Shekar's architecture emphasizes a modular and performance-optimized pipeline, featuring advanced Unicode normalization, novel subword tokenization strategies adapted from SentencePiece, and seamless integration layers for Transformer-based models such as ParsBERT and ALBERT. Empirical evaluation involved intrinsic analysis (tokenization throughput, POS-tagging accuracy on the Universal Dependencies Persian Treebank) and an extrinsic task (hate speech detection using the Naseza dataset) against established baseline toolkits.
Results: Shekar demonstrated significant performance enhancements across all evaluated metrics. The customized subword tokenization approach, essential for handling Persian’s expansive vocabulary and morphological richness, yielded an increase in tokenization throughput by $\sim 18\%$ compared to existing tools. Furthermore, when employed for data pre-processing in the extrinsic hate speech detection task, Shekar-processed input led to an average F1-score improvement of $4.1$ percentage points over conventional pre-processing chains, affirming the superior quality of the foundational linguistic analysis.
Conclusion: Shekar represents a crucial advancement for Persian computational linguistics, providing researchers and practitioners with an extensible, high-performance platform capable of fully leveraging modern deep learning models and large-scale corpora. Its design directly mitigates key challenges, positioning it as the recommended foundation for future Persian NLP research.
Keywords
References
Similar Articles
- Dr. Elena M. Ruiz, Integrating Big Data Architectures and AI-Powered Analytics into Mergers & Acquisitions Due Diligence: A Theoretical Framework for Value Measurement, Risk Detection, and Strategic Decision-Making , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 09 (2025): Volume 02 Issue 09
- Michael Andersson, Optimizing Continuous Schema Evolution and Zero-Downtime Microservices in Enterprise Data Architectures , International Journal of Advanced Artificial Intelligence Research: Vol. 3 No. 01 (2026): Volume 03 Issue 01
- Michael Andrew Thornton, Designing and Evaluating Low Latency Web APIs for High Transaction and Industrial Internet Systems: Architectural, Methodological, and Socio Technical Perspectives , International Journal of Advanced Artificial Intelligence Research: Vol. 3 No. 01 (2026): Volume 03 Issue 01
- Nourhan F. Abdelrahman, Miguel Torres, CRAFTING DUAL-IDENTITY FACE IMPERSONATIONS USING GENERATIVE ADVERSARIAL NETWORKS: AN ADVERSARIAL ATTACK METHODOLOGY , International Journal of Advanced Artificial Intelligence Research: Vol. 1 No. 01 (2024): Volume 01 Issue 01
- Dr. Ayesha Siddiqui, ENHANCED IDENTIFICATION OF EQUATORIAL PLASMA BUBBLES IN AIRGLOW IMAGERY VIA 2D PRINCIPAL COMPONENT ANALYSIS AND INTERPRETABLE AI , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 02 (2025): Volume 02 Issue 02
- Dr. Matteo Rossi, Dr. Aisha El-Sayed, META-LEARNING DRIVEN FEW-SHOT DIAGNOSTICS: ADDRESSING RARE DISEASE CLASSIFICATION IN MEDICAL AI , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 05 (2025): Volume 02 Issue 05
- Sara Rossi, Samuel Johnson, NEUROSYMBOLIC AI: MERGING DEEP LEARNING AND LOGICAL REASONING FOR ENHANCED EXPLAINABILITY , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 06 (2025): Volume 02 Issue 06
- Dr. Jae-Won Kim, Dr. Sung-Ho Lee, NAVIGATING ALGORITHMIC EQUITY: UNCOVERING DIVERSITY AND INCLUSION INCIDENTS IN ARTIFICIAL INTELLIGENCE , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 07 (2025): Volume 02 Issue 07
- Dr. Elias T. Vance, Prof. Camille A. Lefevre, ENHANCING TRUST AND CLINICAL ADOPTION: A SYSTEMATIC LITERATURE REVIEW OF EXPLAINABLE ARTIFICIAL INTELLIGENCE (XAI) APPLICATIONS IN HEALTHCARE , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Serhii Yakhin, Comparative Review of Clean Architecture and Vertical Slice Architecture Approaches for Enterprise .NET Applications , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 12 (2025): Volume 02 Issue 12
You may also start an advanced similarity search for this article.