TY - JOUR
T1 - Urdu Sentiment Analysis with Deep Learning Methods
AU - Khan, Lal
AU - Amjad, Ammar
AU - Ashraf, Noman
AU - Chang, Hsien Tsung
AU - Gelbukh, Alexander
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.
AB - Although over 169 million people in the world are familiar with the Urdu language and a large quantity of Urdu data is being generated on different social websites daily, very few research studies and efforts have been completed to build language resources for the Urdu language and examine user sentiments. The primary objective of this study is twofold: (1) develop a benchmark dataset for resource-deprived Urdu language for sentiment analysis and (2) evaluate various machine and deep learning algorithms for sentiment. To find the best technique, we compare two modes of text representation: count-based, where the text is represented using word n -gram feature vectors and the second one is based on fastText pre-trained word embeddings for Urdu. We consider a set of machine learning classifiers (RF, NB, SVM, AdaBoost, MLP, LR) and deep leaning classifiers (1D-CNN and LSTM) to run the experiments for all the feature types. Our study shows that the combination of word n -gram features with LR outperformed other classifiers for sentiment analysis task, obtaining the highest F1 score of 82.05% using combination of features.
KW - Urdu sentiment analysis
KW - deep learning
KW - machine learning
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85110666018&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2021.3093078
DO - 10.1109/ACCESS.2021.3093078
M3 - 文章
AN - SCOPUS:85110666018
SN - 2169-3536
VL - 9
SP - 97803
EP - 97812
JO - IEEE Access
JF - IEEE Access
M1 - 9466841
ER -