An assessment on character-based Chinese news filtering using latent semantic indexing

Shih Hung Wu, Pey Ching Yang, Von Wun Soo

研究成果: 會議稿件的類型論文同行評審

摘要

In this paper, we assessed the Latent Semantic Indexing (LSI) approach for Chinese information filtering. The assessment was for Chinese news filtering agents that used a character-based and hierarchical filtering scheme. The traditional vector space model was employed as information filtering model, and each document was converted into a vector of weights of terms. Instead of using words as terms in IR denominating tradition, the terms were referred to Chinese characters. LSI captured the semantic relationship between the documents and Chinese characters. We used the Singular-value Decomposition(SVD) technique to compress the terms space into a lower dimension which achieves latent association between document and terms. We showed by experiments that the recall and precision results of Chinese news filtering by character-based approach incorporating the LSI technique into the information filtering system were satisfactory.

原文英語
頁面209-223
頁數15
出版狀態已出版 - 1997
對外發佈

文獻附註

Publisher Copyright:
© 1997 Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997. All rights reserved.

指紋

深入研究「An assessment on character-based Chinese news filtering using latent semantic indexing」主題。共同形成了獨特的指紋。

引用此