The Lemur community query log project was started over one year ago with the aim of building up a query log that could be used by the IR research community. Despite the privacy controls and assurances that data would only be released after review and in a controlled manner to researchers using a TREC-like protocol, the response from the community has been underwhelming. Given that we have gathered the equivalent of less than 6 seconds of Google traffic (assuming 500 million queries per day) in one year, we have decided to terminate the project. The statistics of the query log data we gathered are listed below.  Due to the small amount of data, we feel that we cannot do a general release without compromising privacy, and there simply is not enough data for most techniques that use query logs. If you have a special request to study some aspect of this data, you can contact the people at UMass.

Thank you for your participation. We are currently working on generating an “anchor text” log from the ClueWeb data that should be useful for a range of query log techniques[1]. The Lemur query log toolbar will continue to be developed and supported.

==============================================================

Data collected from Apr 21, 2009 (SIGIR list announcement sent) to May 3, 2010.

 

130 Firefox toolbar installs

 31 Internet Explorer installs

161 unique computer installs

645 distinct session ids, the majority of which were generated randomly at log upload time

976 logs uploaded

 

397,729 query result list entry pairs

34,404 unique queries

15,802 queries with 0 results (primarily results page parsing errors)

28,971 clicked result URLs

11,487 distinct queries with clicks

278,713 distinct result URLs

271,957 distinct result titles

 

Using the Stanford Named Entity Recognizer (11/2007 version):

 

  1519 queries containing one or more locations

  1877 queries containing one or more organizations

  2451 queries containing one or more person names

  5847 total

  Of these, there are 1873 unique name queries.

==============================================================

 



[1] V. Dang and W. B. Croft. “Query reformulation using anchor text”, Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM) 2010, pp. 41-50, (2010).

This site is maintained by the Department of Computer Science/Center for Intelligent Information Retrieval
© 2010 University of Massachusetts AmherstSite Policies