Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media

Lal Khan, Ammar Amjad, Kanwar Muhammad Afaq, Hsien Tsung Chang*

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

66 Scopus citations

Abstract

Sentiment analysis (SA) has been an active research subject in the domain of natural language processing due to its important functions in interpreting people’s perspectives and drawing successful opinion-based judgments. On social media, Roman Urdu is one of the most extensively utilized dialects. Sentiment analysis of Roman Urdu is difficult due to its morphological complexities and varied dialects. The purpose of this paper is to evaluate the performance of various word embeddings for Roman Urdu and English dialects using the CNN-LSTM architecture with traditional machine learning classifiers. We introduce a novel deep learning architecture for Roman Urdu and English dialect SA based on two layers: LSTM for long-term dependency preservation and a one-layer CNN model for local feature extraction. To obtain the final classification, the feature maps learned by CNN and LSTM are fed to several machine learning classifiers. Various word embedding models support this concept. Extensive tests on four corpora show that the proposed model performs exceptionally well in Roman Urdu and English text sentiment classification, with an accuracy of 0.904, 0.841, 0.740, and 0.748 against MDPI, RUSA, RUSA-19, and UCL datasets, respectively. The results show that the SVM classifier and the Word2Vec CBOW (Continuous Bag of Words) model are more beneficial options for Roman Urdu sentiment analysis, but that BERT word embedding, two-layer LSTM, and SVM as a classifier function are more suitable options for English language sentiment analysis. The suggested model outperforms existing well-known advanced models on relevant corpora, improving the accuracy by up to 5%.

Original languageEnglish
Article number2694
JournalApplied Sciences (Switzerland)
Volume12
Issue number5
DOIs
StatePublished - 01 03 2022

Bibliographical note

Publisher Copyright:
© 2022 by the authors. Licensee MDPI, Basel, Switzerland.

Keywords

  • Deep learning
  • LSTM
  • Machine learning
  • Roman Urdu language
  • Sentiment analysis
  • Word embedding

Fingerprint

Dive into the research topics of 'Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media'. Together they form a unique fingerprint.

Cite this