Abstract
The rapid growth of Internet brings new challenges on designing a scalable
informationretrieval system. To reduce the user response time, we investigate the
problem of parallelizingBoolean query processing on a cluster of workstations. The
key issue is to partition the posting filesuch that, during parallel query processing,
each workstation consults only its own locally residentdata to complete its task. This
is achieved by making all postings corresponding to a document asnon-separable
objects in the posting file partitioning. Following the partitioning by document
IDprinciple, we develop partitioning algorithms to transform a sequential information
retrieval systemto a parallel information retrieval system. The partitioning schemes
are designed to balanceworkload between workstations without increasing the average
time to process a posting. Theexperiment shows that almost linear speed-up can be
achieved. This work shows that parallelprocessing technique is feasible to build a
scalable information retrieval system.
Original language | American English |
---|---|
Pages (from-to) | 11-21 |
Journal | 中華民國資訊學會通訊 |
Volume | 3 |
Issue number | 3 |
State | Published - 2000 |