The Lemur community query log project was started over one year ago with the aim of building up a query log that could be used by the IR research community. Despite the privacy controls and assurances that data would only be released after review and in a controlled manner to researchers using a TREC-like protocol, the response from the community has been underwhelming. Given that we have gathered the equivalent of less than 6 seconds of Google traffic (assuming 500 million queries per day) in one year, we have decided to terminate the project. The statistics of the query log data we gathered are listed below. Due to the small amount of data, we feel that we cannot do a general release without compromising privacy, and there simply is not enough data for most techniques that use query logs. If you have a special request to study some aspect of this data, you can contact the people at UMass.
Thank you for your participation. We are currently working on generating an “anchor text” log from the ClueWeb data that should be useful for a range of query log techniques[1]. The Lemur query log toolbar will continue to be developed and supported.
==============================================================
Data
collected from Apr 21, 2009 (SIGIR list announcement sent) to May 3, 2010.
130 Firefox
toolbar installs
31 Internet Explorer installs
161 unique
computer installs
645 distinct
session ids, the majority of which were generated randomly at log upload time
976 logs
uploaded
397,729
query result list entry pairs
34,404
unique queries
15,802
queries with 0 results (primarily results page parsing errors)
28,971
clicked result URLs
11,487
distinct queries with clicks
278,713
distinct result URLs
271,957
distinct result titles
Using the
Stanford Named Entity Recognizer (11/2007 version):
1519 queries containing one or more
locations
1877 queries containing one or more
organizations
2451 queries containing one or more
person names
5847 total
Of these, there are 1873 unique name
queries.
==============================================================
[1] V. Dang and W. B. Croft. “Query reformulation using anchor text”, Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM) 2010, pp. 41-50, (2010).