On retrieval performance of malay textual documents

Mohd Pouzi Hamzah, Tengku Mohd Tengku Sembok

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    2 Citations (Scopus)

    Abstract

    This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for un-weighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.

    Original languageEnglish
    Title of host publicationProceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006
    Pages156-161
    Number of pages6
    Publication statusPublished - 2006
    EventIASTED International Conference on Artificial Intelligence and Applications, AIA 2006 - Innsbruck
    Duration: 13 Feb 200616 Feb 2006

    Other

    OtherIASTED International Conference on Artificial Intelligence and Applications, AIA 2006
    CityInnsbruck
    Period13/2/0616/2/06

    Fingerprint

    Vector spaces

    Keywords

    • Information retrieval
    • Malay language
    • Retrieval performance
    • Similarity measure
    • Vector space

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Computer Science Applications
    • Software

    Cite this

    Hamzah, M. P., & Sembok, T. M. T. (2006). On retrieval performance of malay textual documents. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006 (pp. 156-161)

    On retrieval performance of malay textual documents. / Hamzah, Mohd Pouzi; Sembok, Tengku Mohd Tengku.

    Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006. 2006. p. 156-161.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Hamzah, MP & Sembok, TMT 2006, On retrieval performance of malay textual documents. in Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006. pp. 156-161, IASTED International Conference on Artificial Intelligence and Applications, AIA 2006, Innsbruck, 13/2/06.
    Hamzah MP, Sembok TMT. On retrieval performance of malay textual documents. In Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006. 2006. p. 156-161
    Hamzah, Mohd Pouzi ; Sembok, Tengku Mohd Tengku. / On retrieval performance of malay textual documents. Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006. 2006. pp. 156-161
    @inproceedings{332bd73149ff448aa2f2681b6f17edc3,
    title = "On retrieval performance of malay textual documents",
    abstract = "This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for un-weighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.",
    keywords = "Information retrieval, Malay language, Retrieval performance, Similarity measure, Vector space",
    author = "Hamzah, {Mohd Pouzi} and Sembok, {Tengku Mohd Tengku}",
    year = "2006",
    language = "English",
    pages = "156--161",
    booktitle = "Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006",

    }

    TY - GEN

    T1 - On retrieval performance of malay textual documents

    AU - Hamzah, Mohd Pouzi

    AU - Sembok, Tengku Mohd Tengku

    PY - 2006

    Y1 - 2006

    N2 - This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for un-weighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.

    AB - This paper analyzes the effect of two factors affecting retrieval performance of Malay textual documents: similarity measures and conflation of words. Three similarity measures namely inner product for un-weighted query terms, inner product for weighted query terms and cosine of the angle between query and document vectors have been studied and tested on Malay test collection. This paper shows that cosine method outperforms other similarity measures significantly. To further enhance the performance, data has been conflated using Malay stemming algorithms. This conflated data together with cosine method as a basis for calculating similarity in vector space shows significant improvement in term of precision.

    KW - Information retrieval

    KW - Malay language

    KW - Retrieval performance

    KW - Similarity measure

    KW - Vector space

    UR - http://www.scopus.com/inward/record.url?scp=38049166385&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=38049166385&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:38049166385

    SP - 156

    EP - 161

    BT - Proceedings of the IASTED International Conference on Artificial Intelligence and Applications, AIA 2006

    ER -