Constructing optimal educational tests using GMDH-based item ranking and selection
Radwan E. Abdel-Aal
El-Sayed M. El-Alfy
Item ranking and selection plays a key role in constructing concise and informative educational tests. Traditional techniques based on the item response theory (IRT) have been used to automate this task, but they require model parameters to be determined a priori for each item and their application becomes more tedious with larger item banks. Machine-learning techniques can be used to build data-based models that relate the test result as output to the examinees’ responses to various test items as inputs. With this approach, test item selection can benefit from the vast amount of literature on feature selection in many areas of machine learning and artificial intelligence that are characterized by high data dimensionality. This paper describes a novel technique for item ranking and selection using abductive network pass/fail classifiers based on the group method of data handling (GMDH). Experiments were carried out on a dataset consisting of the response of 2000 examinees to 45 test items together with the examinee`s true ability level. The approach utilizes the ability of GMDH-based learning algorithms to automatically select optimum input features from a set of available inputs. Rankings obtained by iteratively applying this procedure are similar to those based on the average item information function (IIF) at the pass-fail ability threshold, IIF (θ=0), and the average information gain (IG). An optimum item subset derived from the GMDH-based ranking contains only one third of the test items and performs pass/fail classification with 91.2% accuracy on a 500-case evaluation subset, compared to 86.8% for a randomly selected item subset of the same size and 92% for a subset of the 15 items having the largest values for IIF (θ=0). Item rankings obtained with the proposed approach compare favorably with those obtained using neural network modeling and popular filter type feature selection methods, and the proposed approach is much faster than wrapper methods employing genetic search.