Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points

Amirah Ismail, Tengku Mohd T Sembok, Halimah Badioze Zaman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Internet has become a huge store of distributed documents storage and retrieval system consisting of various topics that you might think off. A user of the Internet or the store, at times, seeks certain information which he may not know to solve a problem. He therefore has to express his information need as a request for information in one form or another using any search engine. The search engine then tries to infer and retrieve the relevant documents and presents them in the hits list for a particular query. But the relevant ones from the hits list can only be determined by the user himself based on his information need. The quality of the hits list very much depends on the effectiveness of the indexing process which generate the surrogates out of the original documents. Usually, the quality of the hit list can be measured by the precision measurement, i.e. the ratio of the number of retrieved and relevant documents over the number of retrieved documents. This measurement has been used to evaluate ten major search engines using ten queries at cutoff points of 10, 20, 30, 40 and 50. Besides the precision measurement, we have introduced an overlap measurement to determine the commonality of documents between the hit lists of various search engines. With this two measurements we can evaluate the performance of the search engines better. The search engines chosen for the studies are Altavista, Hotbot, Excite, Lycos, Webcrawler, Infoseek, Magellan, Northernlight, SavvySearch and Metacrawler. The experiment conducted will rank the search engines based on the precision measurement obtained. The correlation between the ranking list and the overlap measurement will be reported in the paper.

Original languageEnglish
Title of host publicationIEEE Region 10 Annual International Conference, Proceedings/TENCON
Volume3
Publication statusPublished - 2000
Event2000 TENCON Proceedings - Kuala Lumpur, Malaysia
Duration: 24 Sep 200027 Sep 2000

Other

Other2000 TENCON Proceedings
CityKuala Lumpur, Malaysia
Period24/9/0027/9/00

Fingerprint

Search engines
Internet
Experiments

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Ismail, A., Sembok, T. M. T., & Badioze Zaman, H. (2000). Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points. In IEEE Region 10 Annual International Conference, Proceedings/TENCON (Vol. 3)

Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points. / Ismail, Amirah; Sembok, Tengku Mohd T; Badioze Zaman, Halimah.

IEEE Region 10 Annual International Conference, Proceedings/TENCON. Vol. 3 2000.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ismail, A, Sembok, TMT & Badioze Zaman, H 2000, Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points. in IEEE Region 10 Annual International Conference, Proceedings/TENCON. vol. 3, 2000 TENCON Proceedings, Kuala Lumpur, Malaysia, 24/9/00.
Ismail A, Sembok TMT, Badioze Zaman H. Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points. In IEEE Region 10 Annual International Conference, Proceedings/TENCON. Vol. 3. 2000
Ismail, Amirah ; Sembok, Tengku Mohd T ; Badioze Zaman, Halimah. / Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points. IEEE Region 10 Annual International Conference, Proceedings/TENCON. Vol. 3 2000.
@inproceedings{113d0089efc74833acf752bcf741075e,
title = "Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points",
abstract = "Internet has become a huge store of distributed documents storage and retrieval system consisting of various topics that you might think off. A user of the Internet or the store, at times, seeks certain information which he may not know to solve a problem. He therefore has to express his information need as a request for information in one form or another using any search engine. The search engine then tries to infer and retrieve the relevant documents and presents them in the hits list for a particular query. But the relevant ones from the hits list can only be determined by the user himself based on his information need. The quality of the hits list very much depends on the effectiveness of the indexing process which generate the surrogates out of the original documents. Usually, the quality of the hit list can be measured by the precision measurement, i.e. the ratio of the number of retrieved and relevant documents over the number of retrieved documents. This measurement has been used to evaluate ten major search engines using ten queries at cutoff points of 10, 20, 30, 40 and 50. Besides the precision measurement, we have introduced an overlap measurement to determine the commonality of documents between the hit lists of various search engines. With this two measurements we can evaluate the performance of the search engines better. The search engines chosen for the studies are Altavista, Hotbot, Excite, Lycos, Webcrawler, Infoseek, Magellan, Northernlight, SavvySearch and Metacrawler. The experiment conducted will rank the search engines based on the precision measurement obtained. The correlation between the ranking list and the overlap measurement will be reported in the paper.",
author = "Amirah Ismail and Sembok, {Tengku Mohd T} and {Badioze Zaman}, Halimah",
year = "2000",
language = "English",
volume = "3",
booktitle = "IEEE Region 10 Annual International Conference, Proceedings/TENCON",

}

TY - GEN

T1 - Search engines evaluation using precision and document-overlap measurements at 10-50 cutoff points

AU - Ismail, Amirah

AU - Sembok, Tengku Mohd T

AU - Badioze Zaman, Halimah

PY - 2000

Y1 - 2000

N2 - Internet has become a huge store of distributed documents storage and retrieval system consisting of various topics that you might think off. A user of the Internet or the store, at times, seeks certain information which he may not know to solve a problem. He therefore has to express his information need as a request for information in one form or another using any search engine. The search engine then tries to infer and retrieve the relevant documents and presents them in the hits list for a particular query. But the relevant ones from the hits list can only be determined by the user himself based on his information need. The quality of the hits list very much depends on the effectiveness of the indexing process which generate the surrogates out of the original documents. Usually, the quality of the hit list can be measured by the precision measurement, i.e. the ratio of the number of retrieved and relevant documents over the number of retrieved documents. This measurement has been used to evaluate ten major search engines using ten queries at cutoff points of 10, 20, 30, 40 and 50. Besides the precision measurement, we have introduced an overlap measurement to determine the commonality of documents between the hit lists of various search engines. With this two measurements we can evaluate the performance of the search engines better. The search engines chosen for the studies are Altavista, Hotbot, Excite, Lycos, Webcrawler, Infoseek, Magellan, Northernlight, SavvySearch and Metacrawler. The experiment conducted will rank the search engines based on the precision measurement obtained. The correlation between the ranking list and the overlap measurement will be reported in the paper.

AB - Internet has become a huge store of distributed documents storage and retrieval system consisting of various topics that you might think off. A user of the Internet or the store, at times, seeks certain information which he may not know to solve a problem. He therefore has to express his information need as a request for information in one form or another using any search engine. The search engine then tries to infer and retrieve the relevant documents and presents them in the hits list for a particular query. But the relevant ones from the hits list can only be determined by the user himself based on his information need. The quality of the hits list very much depends on the effectiveness of the indexing process which generate the surrogates out of the original documents. Usually, the quality of the hit list can be measured by the precision measurement, i.e. the ratio of the number of retrieved and relevant documents over the number of retrieved documents. This measurement has been used to evaluate ten major search engines using ten queries at cutoff points of 10, 20, 30, 40 and 50. Besides the precision measurement, we have introduced an overlap measurement to determine the commonality of documents between the hit lists of various search engines. With this two measurements we can evaluate the performance of the search engines better. The search engines chosen for the studies are Altavista, Hotbot, Excite, Lycos, Webcrawler, Infoseek, Magellan, Northernlight, SavvySearch and Metacrawler. The experiment conducted will rank the search engines based on the precision measurement obtained. The correlation between the ranking list and the overlap measurement will be reported in the paper.

UR - http://www.scopus.com/inward/record.url?scp=0034428796&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0034428796&partnerID=8YFLogxK

M3 - Conference contribution

VL - 3

BT - IEEE Region 10 Annual International Conference, Proceedings/TENCON

ER -