An assessment on character-based Chinese news filtering using latent semantic indexing

Shih Hung Wu, Pey Ching Yang, Von Wun Soo

Research output: Contribution to conferenceConference Paperpeer-review

Abstract

In this paper, we assessed the Latent Semantic Indexing (LSI) approach for Chinese information filtering. The assessment was for Chinese news filtering agents that used a character-based and hierarchical filtering scheme. The traditional vector space model was employed as information filtering model, and each document was converted into a vector of weights of terms. Instead of using words as terms in IR denominating tradition, the terms were referred to Chinese characters. LSI captured the semantic relationship between the documents and Chinese characters. We used the Singular-value Decomposition(SVD) technique to compress the terms space into a lower dimension which achieves latent association between document and terms. We showed by experiments that the recall and precision results of Chinese news filtering by character-based approach incorporating the LSI technique into the information filtering system were satisfactory.

Original languageEnglish
Pages209-223
Number of pages15
StatePublished - 1997
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 1997 Proceedings of the 10th Research on Computational Linguistics International Conference, ROCLING 1997. All rights reserved.

Fingerprint

Dive into the research topics of 'An assessment on character-based Chinese news filtering using latent semantic indexing'. Together they form a unique fingerprint.

Cite this