Abstract
Inverted file is the most popular indexing mechanism to speedup the
document search in anInformation Retrieval System (IRS). The size of the inverted
file is usually enormous. Traditionally,the d-gap technique is applied to an inverted
file to replace document identifications (document IDs)by smaller numbers. These
numbers can be further compressed efficiently. However, large gapvalues may cause
the compression rate not as well as we expected. In this paper we propose adocument
ID reassigning algorithm by exploiting the cluster property to reduce the gap values.
Inaddition, we propose an improved notation to make up the shortcoming of d-gap
technique. Weshow that the inverted file compression rate can be improved 16 to 23
over pure d-gap technique.
| Original language | American English |
|---|---|
| Pages (from-to) | 23-37 |
| Journal | 中華民國資訊學會通訊 |
| Volume | 3 |
| Issue number | 3 |
| State | Published - 2000 |