Approximate keyword search in web search engines

Sun Wu*, Hsien Tsung Chang, Ting Chao Hsu, Pei Shin Liu

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

We present a new index method to provide approximate keyword search in search engines. Our approximate keyword matching adopts a new similarity measurement called Listance model, which is a variation of the LCS (Longest Common Subsequence) model. Two keywords are considered approximately matched, if their Listance is no more than a predefined parameter k. Suppose the length of keywords A and B are m and n respectively, the Listance between A and B is defined to be max(m, n) - LCS(A, B). The index method uses a new data structure called LBS index (Listance Bounded Subsequence index), which was designed to allow for very fast approximate keyword matching. In the index phase, a collection of keywords is used as a reference dictionary. We transform keywords in the web pages into a special form to be indexed if they match one of the keywords approximately in the reference dictionary. During the query processing, a similar keyword transformation is conducted to search the approximate index. The experimental result shows that our approach is efficient and can provide approximate keyword search capability that could be practically interesting.

Original languageEnglish
Title of host publication2006 1st International Conference on Digital Information Management, ICDIM
Pages404-411
Number of pages8
DOIs
StatePublished - 2006
Externally publishedYes
Event2006 1st International Conference on Digital Information Management, ICDIM - Bangalore, India
Duration: 06 12 200608 12 2006

Publication series

Name2006 1st International Conference on Digital Information Management, ICDIM

Conference

Conference2006 1st International Conference on Digital Information Management, ICDIM
Country/TerritoryIndia
CityBangalore
Period06/12/0608/12/06

Fingerprint

Dive into the research topics of 'Approximate keyword search in web search engines'. Together they form a unique fingerprint.

Cite this