A Comprehensive Evaluation Of Shekar: An Open-Source Python Framework For State-Of-The-Art Persian Natural Language Processing And Computational Linguistics
Abstract
Purpose: This study introduces and comprehensively evaluates Shekar, an open-source Python toolkit engineered to address the persistent challenges in processing the morphologically rich and low-resource Persian language. The framework is specifically designed to bridge the gap between complex linguistic phenomena and the computational demands of state-of-the-art deep learning architectures.
Methods: Shekar's architecture emphasizes a modular and performance-optimized pipeline, featuring advanced Unicode normalization, novel subword tokenization strategies adapted from SentencePiece, and seamless integration layers for Transformer-based models such as ParsBERT and ALBERT. Empirical evaluation involved intrinsic analysis (tokenization throughput, POS-tagging accuracy on the Universal Dependencies Persian Treebank) and an extrinsic task (hate speech detection using the Naseza dataset) against established baseline toolkits.
Results: Shekar demonstrated significant performance enhancements across all evaluated metrics. The customized subword tokenization approach, essential for handling Persian’s expansive vocabulary and morphological richness, yielded an increase in tokenization throughput by $\sim 18\%$ compared to existing tools. Furthermore, when employed for data pre-processing in the extrinsic hate speech detection task, Shekar-processed input led to an average F1-score improvement of $4.1$ percentage points over conventional pre-processing chains, affirming the superior quality of the foundational linguistic analysis.
Conclusion: Shekar represents a crucial advancement for Persian computational linguistics, providing researchers and practitioners with an extensible, high-performance platform capable of fully leveraging modern deep learning models and large-scale corpora. Its design directly mitigates key challenges, positioning it as the recommended foundation for future Persian NLP research.
Keywords
References
Similar Articles
- Dr. Alessia Romano, Prof. Marco Bianchi, DEVELOPING AI ASSISTANCE FOR INCLUSIVE COMMUNICATION IN ITALIAN FORMAL WRITING , International Journal of Advanced Artificial Intelligence Research: Vol. 1 No. 01 (2024): Volume 01 Issue 01
- Dr. Elara V. Sorenson, Deep Contextual Understanding: A Parameter-Efficient Large Language Model Approach To Fine-Grained Affective Computing , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Prof. Michael T. Edwards, ENHANCING AI-CYBERSECURITY EDUCATION: DEVELOPMENT OF AN AI-BASED CYBERHARASSMENT DETECTION LABORATORY EXERCISE , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 02 (2025): Volume 02 Issue 02
- Dr. Elias A. Petrova, AN EDGE-INTELLIGENT STRATEGY FOR ULTRA-LOW-LATENCY MONITORING: LEVERAGING MOBILENET COMPRESSION AND OPTIMIZED EDGE COMPUTING ARCHITECTURES , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Mason Johnson, Forging Rich Multimodal Representations: A Survey of Contrastive Self-Supervised Learning , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 11 (2025): Volume 02 Issue 11
- Dr. Mei-Ling Zhou, Dr. Haojie Xu, LEARNING RICH FEATURES WITHOUT LABELS: CONTRASTIVE APPROACHES IN MULTIMODAL ARTIFICIAL INTELLIGENCE SYSTEMS , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 04 (2025): Volume 02 Issue 04
- John M. Davenport, AI-AUGMENTED FRAMEWORKS FOR DATA QUALITY VALIDATION: INTEGRATING RULE-BASED ENGINES, SEMANTIC DEDUPLICATION, AND GOVERNANCE TOOLS FOR ROBUST LARGE-SCALE DATA PIPELINES , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 08 (2025): Volume 02 Issue 08
- Dr. Larian D. Venorth, Prof. Elias J. Vance, A Machine Learning Approach to Identifying Maternal Risk Factors for Congenital Heart Disease , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 08 (2025): Volume 02 Issue 08
- Severov Arseni Vasilievich, Artyom V. Smirnov, Architecting Real-Time Risk Stratification in the Insurance Sector: A Deep Convolutional and Recurrent Neural Network Framework for Dynamic Predictive Modeling , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 10 (2025): Volume 02 Issue 10
- Elena Volkova, Emily Smith, INVESTIGATING DATA GENERATION STRATEGIES FOR LEARNING HEURISTIC FUNCTIONS IN CLASSICAL PLANNING , International Journal of Advanced Artificial Intelligence Research: Vol. 2 No. 04 (2025): Volume 02 Issue 04
You may also start an advanced similarity search for this article.