A tree-based inverted file for fast ranked-document retrieval

Wann Yun Shieh*, Tien Fu Chen, Chung Ping Chung

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

Inverted files are widely used to index documents in large-scale information retrieval systems. An inverted file consists of posting lists, which can be stored in either a document-identifier ascending order or a document-weight descending order. For an identifier-ascending-order posting list, retrieving ranked documents necessitates traversal of all postings, whereas for the weight-descending-order posting list, performing Boolean queries involves very complex processing. In this paper, we transform a posting list to a tree-based structure, called the n-key-heap posting tree, to speedup ranked-document retrieval for Boolean queries. In this structure, the orders of document identifiers and document weights are preserved simultaneously. To preserve the identifier order, the edge pointers are designed to maintain numerical order in the posting tree. To preserve the weight order, greater-weight postings are stored in higher tree nodes by the heap property. We model these criteria to a tree-construction problem and propose an efficient algorithm to construct an optimal posting tree having the minimal access time.

Original languageEnglish
Title of host publicationProceedings of the International Conference on Information and Knowledge Engineering 2003
EditorsN. Goharian, N. Goharian
Pages64-69
Number of pages6
StatePublished - 2003
Externally publishedYes
EventProceedings of the International Conference on Information and Knowledge Engineering 2003 - Las Vegas, NV, United States
Duration: 23 06 200326 06 2003

Publication series

NameProceedings of the International Conference on Information and Knowledge Engineering
Volume1

Conference

ConferenceProceedings of the International Conference on Information and Knowledge Engineering 2003
Country/TerritoryUnited States
CityLas Vegas, NV
Period23/06/0326/06/03

Keywords

  • Boolean query
  • Information retrieval
  • Inverted file
  • Posting tree
  • Ranked document

Fingerprint

Dive into the research topics of 'A tree-based inverted file for fast ranked-document retrieval'. Together they form a unique fingerprint.

Cite this