TY - JOUR
T1 - Factors affecting text mining based stock prediction
T2 - Text feature representations, machine learning models, and news platforms
AU - Lin, Wei Chao
AU - Tsai, Chih Fong
AU - Chen, Hsuan
N1 - Publisher Copyright:
© 2022
PY - 2022/11
Y1 - 2022/11
N2 - Text mining techniques have demonstrated their effectiveness for stock market prediction and different text feature representation approaches, (e.g., TF–IDF and word embedding), have been adapted to extract textual information from financial news sources. In addition, different machine learning techniques including deep learning have been employed to construct the prediction models. Various combinations of text feature representations and learning models have been applied for stock prediction, but it is unknown which performs the best or which ones can be regarded as the representative baselines for future research. Moreover, since the textual contents in the financial news articles published on different news platforms are somewhat different, the effect of using different news platforms may have an impact on prediction performance so this is also examined in the experiments comparing eight different combinations comprised of two context-free and two contextualized text feature representations, i.e. TF–IDF, Word2vec, ELMo, and BERT, and three learning techniques, i.e. SVM, CNN, and LSTM. The experimental results show that CNN+Word2vec and CNN+BERT perform the best. The textual material is taken from three public news platforms including Reuters, CNBC, and The Motley Fool. We found that the learning models constructed and the news platforms used can certainly affect the prediction of stock prices between different companies.
AB - Text mining techniques have demonstrated their effectiveness for stock market prediction and different text feature representation approaches, (e.g., TF–IDF and word embedding), have been adapted to extract textual information from financial news sources. In addition, different machine learning techniques including deep learning have been employed to construct the prediction models. Various combinations of text feature representations and learning models have been applied for stock prediction, but it is unknown which performs the best or which ones can be regarded as the representative baselines for future research. Moreover, since the textual contents in the financial news articles published on different news platforms are somewhat different, the effect of using different news platforms may have an impact on prediction performance so this is also examined in the experiments comparing eight different combinations comprised of two context-free and two contextualized text feature representations, i.e. TF–IDF, Word2vec, ELMo, and BERT, and three learning techniques, i.e. SVM, CNN, and LSTM. The experimental results show that CNN+Word2vec and CNN+BERT perform the best. The textual material is taken from three public news platforms including Reuters, CNBC, and The Motley Fool. We found that the learning models constructed and the news platforms used can certainly affect the prediction of stock prices between different companies.
KW - Feature representation
KW - Financial news
KW - Machine learning
KW - Stock prediction
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85139859753&partnerID=8YFLogxK
U2 - 10.1016/j.asoc.2022.109673
DO - 10.1016/j.asoc.2022.109673
M3 - 文章
AN - SCOPUS:85139859753
SN - 1568-4946
VL - 130
JO - Applied Soft Computing Journal
JF - Applied Soft Computing Journal
M1 - 109673
ER -