Comparison of new simple weighting functions for web documents against existing methods

Byurhan Hyusein, Ahmed Patel, Ferad Zyulkyarov

Research output: Contribution to journalArticle

Abstract

Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the term for the document, i.e. its usefulness for distinguishing documents in a document collection. In search engines operating in a dynamic environment such as the Internet, where many documents are deleted from and added to the database, the usual formula involving the inverse document frequency is too costly to be computed each time the document collection is updated. This paper proposes two new simple and effective weighting functions. These weighting functions have been tested and compared with results obtained for the PIVOT, SMART and INQUERY methods using the WT10g collection of documents.

Original languageEnglish
Pages (from-to)236-243
Number of pages8
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2869
Publication statusPublished - 2003
Externally publishedYes

Fingerprint

Search Engine
Weighting Function
Internet
Databases
Weights and Measures
Term
Search engines
Dynamic Environment
Weighting
Retrieval

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science
  • Engineering(all)

Cite this

@article{ff7b8a80771942d29f38959abb57d5cf,
title = "Comparison of new simple weighting functions for web documents against existing methods",
abstract = "Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the term for the document, i.e. its usefulness for distinguishing documents in a document collection. In search engines operating in a dynamic environment such as the Internet, where many documents are deleted from and added to the database, the usual formula involving the inverse document frequency is too costly to be computed each time the document collection is updated. This paper proposes two new simple and effective weighting functions. These weighting functions have been tested and compared with results obtained for the PIVOT, SMART and INQUERY methods using the WT10g collection of documents.",
author = "Byurhan Hyusein and Ahmed Patel and Ferad Zyulkyarov",
year = "2003",
language = "English",
volume = "2869",
pages = "236--243",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Comparison of new simple weighting functions for web documents against existing methods

AU - Hyusein, Byurhan

AU - Patel, Ahmed

AU - Zyulkyarov, Ferad

PY - 2003

Y1 - 2003

N2 - Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the term for the document, i.e. its usefulness for distinguishing documents in a document collection. In search engines operating in a dynamic environment such as the Internet, where many documents are deleted from and added to the database, the usual formula involving the inverse document frequency is too costly to be computed each time the document collection is updated. This paper proposes two new simple and effective weighting functions. These weighting functions have been tested and compared with results obtained for the PIVOT, SMART and INQUERY methods using the WT10g collection of documents.

AB - Term weighting is one of the most important aspects of modern Web retrieval systems. The weight associated with a given term in a document shows the importance of the term for the document, i.e. its usefulness for distinguishing documents in a document collection. In search engines operating in a dynamic environment such as the Internet, where many documents are deleted from and added to the database, the usual formula involving the inverse document frequency is too costly to be computed each time the document collection is updated. This paper proposes two new simple and effective weighting functions. These weighting functions have been tested and compared with results obtained for the PIVOT, SMART and INQUERY methods using the WT10g collection of documents.

UR - http://www.scopus.com/inward/record.url?scp=0142152953&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0142152953&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0142152953

VL - 2869

SP - 236

EP - 243

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -