+62 813-8532-9115 info@scirepid.com

 
J. Fut. Artif. Intell. Tech. - Journal of Future Artificial Intelligence and Technologies - Vol. 2 Issue. 4 (2025)

Fusion of Statistical and Stylistic Text Features with SVM for Persian Sentiment Analysis

Alireza Bahmani,



Abstract

Sentiment analysis is a critical task in natural language processing (NLP) that classifies text into sentiment categories, such as positive, negative, or neutral. This task is particularly challenging for languages like Persian due to the complexity of their linguistic structure and the scarcity of high-quality labeled datasets. Previous studies on Persian sentiment analysis have largely relied on TF-IDF representations or deep learning models, often overlooking handcrafted statistical and stylistic features that capture subtle textual patterns. This limitation reduces their effectiveness, especially when dealing with informal or noisy text data. Experiments in this study were conducted on a dataset of Persian product reviews from Digikala.com, labeled according to user ratings to indicate positive, negative, or neutral sentiment. In this paper, we propose a novel approach to Persian text sentiment analysis by combining statistical and stylistic (surface-level) features with traditional text-based features such as Term Frequency–Inverse Document Frequency (TF-IDF). Unlike prior works that rely solely on TF-IDF or deep learning representations, our method integrates stylistic and statistical cues to capture expressive nuances in informal Persian text. Additionally, the Support Vector Machine (SVM) classifier is optimized using RandomizedSearchCV to enhance performance. The proposed system utilizes both statistical and textual features to improve classification accuracy. We compare its performance with four baseline models, i.e., Naïve Bayes, Logistic Regression, Random Forest, and Decision Tree, that rely solely on TF-IDF features. The experimental results demonstrate that the proposed approach outperforms the baseline models in terms of accuracy, F1-score, recall, and precision. Specifically, the proposed system achieved the highest accuracy (0.8354), significantly improving negative sentiment detection while maintaining strong performance in positive sentiment classification.







DOI :


Sitasi :

25

PISSN :

EISSN :

3048-3719

Date.Create Crossref:

05-Dec-2025

Date.Issue :

05-Dec-2025

Date.Publish :

05-Dec-2025

Date.PublishOnline :

05-Dec-2025



PDF File :

Resource :

Open

License :

https://creativecommons.org/licenses/by-sa/4.0