Document Identification Reassignment for Inverted File Compression

Wann-Yun Shieh, 陳 添福, 鍾 崇斌, 單 智君

Research output: Contribution to journalJournal Article peer-review

Abstract

     Inverted file is the most popular indexing mechanism to speedup the document search in anInformation Retrieval System (IRS). The size of the inverted file is usually enormous. Traditionally,the d-gap technique is applied to an inverted file to replace document identifications (document IDs)by smaller numbers. These numbers can be further compressed efficiently. However, large gapvalues may cause the compression rate not as well as we expected. In this paper we propose adocument ID reassigning algorithm by exploiting the cluster property to reduce the gap values. Inaddition, we propose an improved notation to make up the shortcoming of d-gap technique. Weshow that the inverted file compression rate can be improved 16 to 23 over pure d-gap technique.
Original languageAmerican English
Pages (from-to)23-37
Journal中華民國資訊學會通訊
Volume3
Issue number3
StatePublished - 2000

Fingerprint

Dive into the research topics of 'Document Identification Reassignment for Inverted File Compression'. Together they form a unique fingerprint.

Cite this