Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms

Wei Chao Lin, Chih Fong Tsai*, Hsuan Chen

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

16 Scopus citations

Abstract

Text mining techniques have demonstrated their effectiveness for stock market prediction and different text feature representation approaches, (e.g., TF–IDF and word embedding), have been adapted to extract textual information from financial news sources. In addition, different machine learning techniques including deep learning have been employed to construct the prediction models. Various combinations of text feature representations and learning models have been applied for stock prediction, but it is unknown which performs the best or which ones can be regarded as the representative baselines for future research. Moreover, since the textual contents in the financial news articles published on different news platforms are somewhat different, the effect of using different news platforms may have an impact on prediction performance so this is also examined in the experiments comparing eight different combinations comprised of two context-free and two contextualized text feature representations, i.e. TF–IDF, Word2vec, ELMo, and BERT, and three learning techniques, i.e. SVM, CNN, and LSTM. The experimental results show that CNN+Word2vec and CNN+BERT perform the best. The textual material is taken from three public news platforms including Reuters, CNBC, and The Motley Fool. We found that the learning models constructed and the news platforms used can certainly affect the prediction of stock prices between different companies.

Original languageEnglish
Article number109673
JournalApplied Soft Computing Journal
Volume130
DOIs
StatePublished - 11 2022

Bibliographical note

Publisher Copyright:
© 2022

Keywords

  • Feature representation
  • Financial news
  • Machine learning
  • Stock prediction
  • Text mining

Fingerprint

Dive into the research topics of 'Factors affecting text mining based stock prediction: Text feature representations, machine learning models, and news platforms'. Together they form a unique fingerprint.

Cite this