An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on malay texts

Zainab Abu Bakar, Tengku Mohd T Sembok, Mohammed Yusoff

    Research output: Contribution to journalArticle

    7 Citations (Scopus)

    Abstract

    This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic-programming methods that measure longest common subsequence and editcost-distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string-similarity method. The best R&R and E results are given by using digram. Editcostdistances produce the best E results, and both dynamicprogramming methods rank second in finding R&R mean measures.

    Original languageEnglish
    Pages (from-to)691-706
    Number of pages16
    JournalJournal of the American Society for Information Science and Technology
    Volume51
    Issue number8
    Publication statusPublished - Jun 2000

    Fingerprint

    Glossaries
    dictionary
    evaluation
    Dynamic programming
    experiment
    Experiments
    programming
    Evaluation
    Matching method
    Query
    Values
    Experiment

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on malay texts. / Abu Bakar, Zainab; Sembok, Tengku Mohd T; Yusoff, Mohammed.

    In: Journal of the American Society for Information Science and Technology, Vol. 51, No. 8, 06.2000, p. 691-706.

    Research output: Contribution to journalArticle

    @article{1973ec97975045e292cb6ec4e0ba465c,
    title = "An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on malay texts",
    abstract = "This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic-programming methods that measure longest common subsequence and editcost-distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string-similarity method. The best R&R and E results are given by using digram. Editcostdistances produce the best E results, and both dynamicprogramming methods rank second in finding R&R mean measures.",
    author = "{Abu Bakar}, Zainab and Sembok, {Tengku Mohd T} and Mohammed Yusoff",
    year = "2000",
    month = "6",
    language = "English",
    volume = "51",
    pages = "691--706",
    journal = "Journal of the Association for Information Science and Technology",
    issn = "2330-1635",
    publisher = "John Wiley and Sons Ltd",
    number = "8",

    }

    TY - JOUR

    T1 - An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on malay texts

    AU - Abu Bakar, Zainab

    AU - Sembok, Tengku Mohd T

    AU - Yusoff, Mohammed

    PY - 2000/6

    Y1 - 2000/6

    N2 - This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic-programming methods that measure longest common subsequence and editcost-distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string-similarity method. The best R&R and E results are given by using digram. Editcostdistances produce the best E results, and both dynamicprogramming methods rank second in finding R&R mean measures.

    AB - This article evaluates the effectiveness of spelling-correction and string-similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling-correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic-programming methods that measure longest common subsequence and editcost-distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string-similarity method. The best R&R and E results are given by using digram. Editcostdistances produce the best E results, and both dynamicprogramming methods rank second in finding R&R mean measures.

    UR - http://www.scopus.com/inward/record.url?scp=0033721946&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0033721946&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:0033721946

    VL - 51

    SP - 691

    EP - 706

    JO - Journal of the Association for Information Science and Technology

    JF - Journal of the Association for Information Science and Technology

    SN - 2330-1635

    IS - 8

    ER -