Item analysis, reliability statistics and standard error of measurement to improve the quality and impact of multiple choice questions in undergraduate medical education in Faculty of Medicine at UniSZA

Shahid Hassan, Rahmah Mohd Amin, Husbani Bt. Mohd Amin Rebuan, Myat Moe Thwe Aung

Research output: Contribution to journalArticle

Abstract

Multiple-choice question as one best answer (OBA) is considered as a more effective tool to test higher order thinking for its reliability and validity compared to objective test (multiple true and false) items. However, to determine quality of OBA questions it needs item analysis for difficulty index (PI) and discrimination index (DI) as well as distractor efficiency (DE) with functional distractor (FD) and non-functional distractor (NFD). However, any flaw in item structuring should not be allowed to affect students' performance due to the error of measurement. Standard error of measurement (SEM) to calculate a band of score can be utilized to reduce the impact of error in assessment. Present study evaluates the quality of 30 items OBA administered in professional II examination to apply the corrective measures and produce quality items for the question bank. The mean (SD) of 30 items OBA = 61.11 (7.495) and the reliability (internal consistency) as Cronbach's alpha = 0.447. Out of 30 OBA items 11(36.66%) with PI = 0.31-0.60 and 12 items (40.00%) with DI = ≥0.19 were placed in category to retain item in question bank, 6 items (20.00%) in category to revise items with DI ≤0.19 and remaining 12 items (40.00%) in category to discard items for either with a poor or with negative DI. Out of a total 120 distractors, the non-functional distractors (NFD) were 63 (52.5%) and functional distracters were 57 (47.5%). 28 items (93.33%) were found to contain 1- 4 NFD and only 2 (6.66%) items were without any NFD. Distracter efficiency (DE) result of 28 items with NDF and only 2 items without NDF showed 7 items each with 1 NFD (75% DE) and 4 NFD (0% DE), 10 items with 2 NFD (50% DE) and 4 items with 3 NFD (25% DE). Standard error of measurement (SEM) calculated for OBA has been ± 5.51 and considering the borderline cut-off point set at ≥45%, a band score within 1 SD (68%) is generated for OBA. The high frequency of difficult or easy items and moderate to poor discrimination suggest the need of items corrective measure. Increased number of NFD and low DE in this study indicates difficulty of teaching faculty in developing plausible distractors for OBA question. Standard error of measurement (SEM) should be utilized to calculate a band of score to make logical decision on pass or fail of borderline students.

Original languageEnglish
Pages (from-to)7-15
Number of pages9
JournalMalaysian Journal of Public Health Medicine
Volume16
Issue number3
Publication statusPublished - 2016
Externally publishedYes

Fingerprint

Undergraduate Medical Education
Medicine
Students
Reproducibility of Results
Teaching

Keywords

  • Difficulty index
  • Discrimination index
  • Distraction efficiency
  • Functional and non-functional distractors
  • MCQ
  • Reliability coefficient
  • Standard error of measurement

ASJC Scopus subject areas

  • Public Health, Environmental and Occupational Health

Cite this

@article{b75251ca81c34aa59650c8767744672d,
title = "Item analysis, reliability statistics and standard error of measurement to improve the quality and impact of multiple choice questions in undergraduate medical education in Faculty of Medicine at UniSZA",
abstract = "Multiple-choice question as one best answer (OBA) is considered as a more effective tool to test higher order thinking for its reliability and validity compared to objective test (multiple true and false) items. However, to determine quality of OBA questions it needs item analysis for difficulty index (PI) and discrimination index (DI) as well as distractor efficiency (DE) with functional distractor (FD) and non-functional distractor (NFD). However, any flaw in item structuring should not be allowed to affect students' performance due to the error of measurement. Standard error of measurement (SEM) to calculate a band of score can be utilized to reduce the impact of error in assessment. Present study evaluates the quality of 30 items OBA administered in professional II examination to apply the corrective measures and produce quality items for the question bank. The mean (SD) of 30 items OBA = 61.11 (7.495) and the reliability (internal consistency) as Cronbach's alpha = 0.447. Out of 30 OBA items 11(36.66{\%}) with PI = 0.31-0.60 and 12 items (40.00{\%}) with DI = ≥0.19 were placed in category to retain item in question bank, 6 items (20.00{\%}) in category to revise items with DI ≤0.19 and remaining 12 items (40.00{\%}) in category to discard items for either with a poor or with negative DI. Out of a total 120 distractors, the non-functional distractors (NFD) were 63 (52.5{\%}) and functional distracters were 57 (47.5{\%}). 28 items (93.33{\%}) were found to contain 1- 4 NFD and only 2 (6.66{\%}) items were without any NFD. Distracter efficiency (DE) result of 28 items with NDF and only 2 items without NDF showed 7 items each with 1 NFD (75{\%} DE) and 4 NFD (0{\%} DE), 10 items with 2 NFD (50{\%} DE) and 4 items with 3 NFD (25{\%} DE). Standard error of measurement (SEM) calculated for OBA has been ± 5.51 and considering the borderline cut-off point set at ≥45{\%}, a band score within 1 SD (68{\%}) is generated for OBA. The high frequency of difficult or easy items and moderate to poor discrimination suggest the need of items corrective measure. Increased number of NFD and low DE in this study indicates difficulty of teaching faculty in developing plausible distractors for OBA question. Standard error of measurement (SEM) should be utilized to calculate a band of score to make logical decision on pass or fail of borderline students.",
keywords = "Difficulty index, Discrimination index, Distraction efficiency, Functional and non-functional distractors, MCQ, Reliability coefficient, Standard error of measurement",
author = "Shahid Hassan and {Mohd Amin}, Rahmah and {Bt. Mohd Amin Rebuan}, Husbani and {Thwe Aung}, {Myat Moe}",
year = "2016",
language = "English",
volume = "16",
pages = "7--15",
journal = "Malaysian Journal of Public Health Medicine",
issn = "1675-0306",
publisher = "Malaysian Public Health Physicians' Association",
number = "3",

}

TY - JOUR

T1 - Item analysis, reliability statistics and standard error of measurement to improve the quality and impact of multiple choice questions in undergraduate medical education in Faculty of Medicine at UniSZA

AU - Hassan, Shahid

AU - Mohd Amin, Rahmah

AU - Bt. Mohd Amin Rebuan, Husbani

AU - Thwe Aung, Myat Moe

PY - 2016

Y1 - 2016

N2 - Multiple-choice question as one best answer (OBA) is considered as a more effective tool to test higher order thinking for its reliability and validity compared to objective test (multiple true and false) items. However, to determine quality of OBA questions it needs item analysis for difficulty index (PI) and discrimination index (DI) as well as distractor efficiency (DE) with functional distractor (FD) and non-functional distractor (NFD). However, any flaw in item structuring should not be allowed to affect students' performance due to the error of measurement. Standard error of measurement (SEM) to calculate a band of score can be utilized to reduce the impact of error in assessment. Present study evaluates the quality of 30 items OBA administered in professional II examination to apply the corrective measures and produce quality items for the question bank. The mean (SD) of 30 items OBA = 61.11 (7.495) and the reliability (internal consistency) as Cronbach's alpha = 0.447. Out of 30 OBA items 11(36.66%) with PI = 0.31-0.60 and 12 items (40.00%) with DI = ≥0.19 were placed in category to retain item in question bank, 6 items (20.00%) in category to revise items with DI ≤0.19 and remaining 12 items (40.00%) in category to discard items for either with a poor or with negative DI. Out of a total 120 distractors, the non-functional distractors (NFD) were 63 (52.5%) and functional distracters were 57 (47.5%). 28 items (93.33%) were found to contain 1- 4 NFD and only 2 (6.66%) items were without any NFD. Distracter efficiency (DE) result of 28 items with NDF and only 2 items without NDF showed 7 items each with 1 NFD (75% DE) and 4 NFD (0% DE), 10 items with 2 NFD (50% DE) and 4 items with 3 NFD (25% DE). Standard error of measurement (SEM) calculated for OBA has been ± 5.51 and considering the borderline cut-off point set at ≥45%, a band score within 1 SD (68%) is generated for OBA. The high frequency of difficult or easy items and moderate to poor discrimination suggest the need of items corrective measure. Increased number of NFD and low DE in this study indicates difficulty of teaching faculty in developing plausible distractors for OBA question. Standard error of measurement (SEM) should be utilized to calculate a band of score to make logical decision on pass or fail of borderline students.

AB - Multiple-choice question as one best answer (OBA) is considered as a more effective tool to test higher order thinking for its reliability and validity compared to objective test (multiple true and false) items. However, to determine quality of OBA questions it needs item analysis for difficulty index (PI) and discrimination index (DI) as well as distractor efficiency (DE) with functional distractor (FD) and non-functional distractor (NFD). However, any flaw in item structuring should not be allowed to affect students' performance due to the error of measurement. Standard error of measurement (SEM) to calculate a band of score can be utilized to reduce the impact of error in assessment. Present study evaluates the quality of 30 items OBA administered in professional II examination to apply the corrective measures and produce quality items for the question bank. The mean (SD) of 30 items OBA = 61.11 (7.495) and the reliability (internal consistency) as Cronbach's alpha = 0.447. Out of 30 OBA items 11(36.66%) with PI = 0.31-0.60 and 12 items (40.00%) with DI = ≥0.19 were placed in category to retain item in question bank, 6 items (20.00%) in category to revise items with DI ≤0.19 and remaining 12 items (40.00%) in category to discard items for either with a poor or with negative DI. Out of a total 120 distractors, the non-functional distractors (NFD) were 63 (52.5%) and functional distracters were 57 (47.5%). 28 items (93.33%) were found to contain 1- 4 NFD and only 2 (6.66%) items were without any NFD. Distracter efficiency (DE) result of 28 items with NDF and only 2 items without NDF showed 7 items each with 1 NFD (75% DE) and 4 NFD (0% DE), 10 items with 2 NFD (50% DE) and 4 items with 3 NFD (25% DE). Standard error of measurement (SEM) calculated for OBA has been ± 5.51 and considering the borderline cut-off point set at ≥45%, a band score within 1 SD (68%) is generated for OBA. The high frequency of difficult or easy items and moderate to poor discrimination suggest the need of items corrective measure. Increased number of NFD and low DE in this study indicates difficulty of teaching faculty in developing plausible distractors for OBA question. Standard error of measurement (SEM) should be utilized to calculate a band of score to make logical decision on pass or fail of borderline students.

KW - Difficulty index

KW - Discrimination index

KW - Distraction efficiency

KW - Functional and non-functional distractors

KW - MCQ

KW - Reliability coefficient

KW - Standard error of measurement

UR - http://www.scopus.com/inward/record.url?scp=84966429507&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84966429507&partnerID=8YFLogxK

M3 - Article

VL - 16

SP - 7

EP - 15

JO - Malaysian Journal of Public Health Medicine

JF - Malaysian Journal of Public Health Medicine

SN - 1675-0306

IS - 3

ER -