Information Technology Journal1812-56381812-5646Asian Network for Scientific Information10.3923/itj.2013.1125.1133RitthipravatPanrasee KumdeeOrrawan BhongmakapatThongchai 62013126This study aims to investigate efficient missing data techniques for prediction of nasopharyngeal carcinoma (NPC) recurrence. Initially, clinical data of patients with NPC who received treatment at Ramathibodi hospital, Thailand, were collected. In total, 495 records were employed for the cancer recurrence prediction. Due to the fact that these data contain different missing values, appropriate missing data techniques (MDTs) must be examined. In this study, complete-case analysis, mean imputation, k-nearest neighbor imputation and Expectation Maximization (EM) imputation are mainly focused. The completed data are then used for developing three different predictive models, i.e., single-point model, multiple-point model and sequential neural network. The experimental results showed that EM imputation was superior to the other missing data techniques in which it provided highest predictive performance in all models. The average area under the receiver operating characteristic curve (AUC) of 0.72 could be achieved. The Hosmer and Lemeshow goodness of fit test was used for evaluating goodness of fit of each model. The results confirmed that EM imputation was the best missing data technique. The sequential neural network outperformed the other models. It provided the highest predictive performances in terms of the average AUC (0.73) and the Chi-square statistic (4.30). In addition, survival curves generated from these predictive models were compared with that of the Kaplan-Meier survival curve. The curves based on EM imputation were closest to the Kaplan-Meier model. From the log-rank test, however, these curves were significantly different (p-value < 0.05).]]>Devi, B., P. Pisani, T.S. Tang and D.M. Parkin,2004Nelwamondo, F.V., S. Mohamed and T. Marwala,2007Schafer, J.L. and J.W. Graham,2002Magnani, M.,2004D'Agostino Jr., R.B.,2007Acock, A.C.,2005Little, R.J.A. and D.B. Rubin,2002Acuna, E. and C. Rodriguez,2004Penny, K.I. and T. Chesney,2006Ennett, C.M., M. Frize and C.R. Walker,2001Jerez, J.M., I. Molina, J.L. Subirats and L. Franco,2006Barzi, F. and M. Woodward,2004Kumdee, O., P. Ritthipravat, T. Bhongmakapat and W. Cheewaruangroj,2008Ohno-Mochado, L.,2001Lin, R.S., S.D. Horn, J.F. Hurdle and R.A.S. Goldfarb,2008Ohno-Machado, L. and M.A. Musen,1997Park, J. and D.W. Edington,2001De Laurentiis, M., S. De Placido, A.R. Bianco, G.M. Clark and P.M. Ravdin,1999Baesens, B., T.V. Gestel, M. Stepanova and D.V.D. Poel,2004Ravdin, P.M. and G.M. Clark,1992Lakshminarayan, K., S.A. Harp and T. Samad,1999Bewick, V., L. Cheek and J. Ball,2004Bewick, V., L. Cheek and J. Ball,2004Lemeshow, D.W. and S. Hosmer,2000Azen, S.P., M. Van Guilder and M.A. Hill,1989Musil, C.M., C.B. Warner, P.K. Yobas and S.L. Jones,2002Karadogan, S.G., L. Marchegiani, L.K. Hansen and J. Larsen,2011Bland, J.M. and D.G. Altman,2004