E-Mail:
Author Avatar

QProber

QProber: Classifying and Searching “Hidden-Web” Text Databases - Project Summary - Many valuable text databases on the web have non-crawlable contents that are “hidden” behind search interfaces. Hence traditional search engines do not index this valuable information. One way to facilitate access to “hidden-web” databases is through commercial Yahoo!-like directories, which organize these databases manually into categories that users can browse. Our QProber system automates the classification of searchable text databases (whether their contents are “hidden” or not) by adaptively probing the databases with queries derived from document classifiers, without retrieving any documents. A large-scale experimental evaluation over 130 real web databases indicates that our technique produces highly accurate database classification results using -on average- fewer than 200 queries of four words or less to classify a database (TOIS’03 paper; SIGMOD’01 paper). Interestingly, our technique is attractive to classify even crawlable text databases (i.e., databases whose contents are not “hidden”) as long as search interfaces for the databases are available (IEEE Data Engineering Bulletin’02 paper). An alternative way to facilitate access to text databases is through “metasearchers,” which provide a unified query interface to search many databases at once. For efficiency, a critical task for a metasearcher is the selection of the most promising databases to search for a query, a task that typically relies on statistical summaries of the database contents. We derive content summaries from searchable text databases by exploiting our probing-based classification algorithm to adaptively zoom in on and extract documents that are representative of the topic coverage of the databases. We can then build content summaries from these topically-focused document samples. A large-scale experimental evaluation over a variety of databases indicates that our content-summary construction technique is efficient and produces more accurate summaries than those from previously proposed strategies (VLDB’02 paper, SIGMOD’04 paper).

What Do You Think?

 


Anti-Spam Image

Want to Start a Blog Here for Free?

Are you an expert in one subject or another? If your goal is to help others and dispense hard-earned information back to the community, stake a claim on your very own Lockergnome blog today! You can write about anything - no matter the topic. Sign-up to start blogging!

Author Avatar
SEO - Dec 3, 2007

Top 5 SEO Tips for Bloggers

Author Avatar
Web Design - Aug 10, 2006

Web Service Software Factory

49 queries / 0.391 seconds.