Parallel Information Retrieval on Cluster of Workstations

Yung-Cheng Ma, 陳 添福, 鍾 崇斌

Research output: Contribution to journalJournal Article peer-review


     The rapid growth of Internet brings new challenges on designing a scalable informationretrieval system. To reduce the user response time, we investigate the problem of parallelizingBoolean query processing on a cluster of workstations. The key issue is to partition the posting filesuch that, during parallel query processing, each workstation consults only its own locally residentdata to complete its task. This is achieved by making all postings corresponding to a document asnon-separable objects in the posting file partitioning. Following the partitioning by document IDprinciple, we develop partitioning algorithms to transform a sequential information retrieval systemto a parallel information retrieval system. The partitioning schemes are designed to balanceworkload between workstations without increasing the average time to process a posting. Theexperiment shows that almost linear speed-up can be achieved. This work shows that parallelprocessing technique is feasible to build a scalable information retrieval system.
Original languageAmerican English
Pages (from-to)11-21
Issue number3
StatePublished - 2000


Dive into the research topics of 'Parallel Information Retrieval on Cluster of Workstations'. Together they form a unique fingerprint.

Cite this