Chinese Unknown Words Extraction for Incomplete Sentences

Yi-Hui Chen, Eric Jui-Lin Lu, Jeng-Jie Huang

研究成果: 期刊稿件文章同行評審

摘要

<div data-language="eng" data-ev-field="abstract">Queried keywords are often used in representing the topics of articles. Word segmentation and unknown word extraction are generally employed to obtain accurate queried keywords. However, existing Chinese unknown word extraction methods are mainly designed to process complete sentences, while the queried keywords are mostly incomplete. In this paper, we propose a Chinese unknown word extraction model for incomplete sentences and use Blog Connect as the experimental platform to collect the queried keywords. A two-phase approach is proposed to solve the unknown word extraction: unknown word detection and unknown word extraction. In the detection phase, we design rules based on the frequency and the probability of queried keywords to detect unknown word candidates. In the extraction phase, we propose a variant of a bottom-up merging algorithm according to pattern and statistical conditions to extract unknown words. The experimental results show that our method can identify about 70% of unknown words and outperforms the CKIP in resolving unknown Chinese words for incomplete sentences.<br/></div> &copy; Institute of Mathematical Statistics, 2022
原文美式英語
頁(從 - 到)755-764
期刊International Journal of Network Security
24
發行號4
DOIs
出版狀態已出版 - 2022

指紋

深入研究「Chinese Unknown Words Extraction for Incomplete Sentences」主題。共同形成了獨特的指紋。

引用此