全部 标题 作者
关键词 摘要

Efficacious End User Measures—Part 1: Relative Class Size and End User Problem Domains

DOI: 10.1155/2013/427958

Full-Text   Cite this paper   Add to My Lib


Biological and medical endeavors are beginning to realize the benefits of artificial intelligence and machine learning. However, classification, prediction, and diagnostic (CPD) errors can cause significant losses, even loss of life. Hence, end users are best served when they have performance information relevant to their needs, this paper’s focus. Relative class size (rCS) is commonly recognized as a confounding factor in CPD evaluation. Unfortunately, rCS-invariant measures are not easily mapped to end user conditions. We determine a cause of rCS invariance, joint probability table (JPT) normalization. JPT normalization means that more end user efficacious measures can be used without sacrificing invariance. An important revelation is that without data normalization, the Matthews correlation coefficient (MCC) and information coefficient (IC) are not relative class size invariants; this is a potential source of confusion, as we found not all reports using MCC or IC normalize their data. We derive MCC rCS-invariant expression. JPT normalization can be extended to allow JPT rCS to be set to any desired value (JPT tuning). This makes sensitivity analysis feasible, a benefit to both applied researchers and practitioners (end users). We apply our findings to two published CPD studies to illustrate how end users benefit. 1. Introduction Biological compounds and systems can be complex, making them difficult to analyze and challenging to understand. This has slowed applying biological and medical advances in the field. Recently, artificial intelligence and machine learning, being particularly effective classification, prediction and diagnostic (CPD) tools, have sped applied research and product development. CPD can be described as the act of comparing observations to models, then deciding whether or not the observations fit the model. Based on some predetermined criterion or criteria, a decision is made regarding class membership ( or ). In many domains, class affiliation is not the end result, rather it is used to determine subsequent activities. Examples include medical diagnoses, bioinformatics, intrusion detection, information retrieval, and patent classification. The list is virtually endless. Incorrect CPD output can lead to frustration, financial loss, and even death; correct CPD output is important. Hence, a number of CPD algorithms have been developed and the field continues to be active. Characterizing CPD effectiveness, then, is necessary. For example, CPD tool developers need to know how their particular modification affects CPD performance, and


[1]  A. Jamain and D. J. Hand, “Mining supervised classification performance studies: a meta-analytic investigation,” Journal of Classification, vol. 25, no. 1, pp. 87–112, 2008.
[2]  R. P. W. Duin, “A note on comparing classifiers,” Pattern Recognition Letters, vol. 17, no. 5, pp. 529–536, 1996.
[3]  D. J. Hand, Measurement Theory and Practice: The World Through Quantification, Oxford University Press, New York, NY, USA, 2004.
[4]  D. B?hning, W. B?hning, and H. Holling, “Revisiting Youden's index as a useful measure of the misclassification error in meta-analysis of diagnostic studies,” Statistical Methods in Medical Research, vol. 17, no. 6, pp. 543–554, 2008.
[5]  R. Caruana and A. Niculescu-Mizil, “Data mining in metric space: an empirical analysis of supervised learning performance criteria,” in Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04), pp. 69–78, August 2004.
[6]  J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” in Proceedings of the 23rd International Conference on Machine Learning (ICML'06), pp. 233–240, June 2006.
[7]  J. M. Fardy, “Evaluation of diagnostic tests,” Methods in Molecular Biology, vol. 473, pp. 127–136, 2009.
[8]  C. Ferri, J. Hernández-Orallo, and R. Modroiu, “An experimental comparison of performance measures for classification,” Pattern Recognition Letters, vol. 30, no. 1, pp. 27–38, 2009.
[9]  V. García, R. A. Mollineda, and J. S. Sánchez, “Theoretical analysis of a performance measure for imbalanced data,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR'10), pp. 617–620, Istanbul, Turkey, August 2010.
[10]  Q. Gu, L. Zhu, and Z. Cai, “Evaluation measures of the classification performance of imbalanced data sets,” Communications in Computer and Information Science, vol. 51, pp. 461–471, 2009.
[11]  N. Japkowicz, “Why question machine learning evaluation methods?” in Proceedings of the AAAI Evaluation Methods for Machine Learning Workshop, pp. 6–11, July 2006.
[12]  R. Potolea and C. Lemnaru, “A comprehensive study of the effect of class imbalance on the performance of classifiers,” 2012, http://search.utcluj.ro/articole/Comprehensive Study.pdf.
[13]  M. Sokolova, N. Japkowicz, and S. Szpakowicz, “Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation,” in Proceedings of the AI 2006: Advances in Artificial Intelligence, pp. 1015–1021, July 2006.
[14]  M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing and Management, vol. 45, no. 4, pp. 427–437, 2009.
[15]  W. J. Youden, “Index for rating diagnostic tests,” Cancer, vol. 3, no. 1, pp. 32–35, 1950.
[16]  A. S. Glas, J. G. Lijmer, M. H. Prins, G. J. Bonsel, and P. M. M. Bossuyt, “The diagnostic odds ratio: a single indicator of test performance,” Journal of Clinical Epidemiology, vol. 56, no. 11, pp. 1129–1135, 2003.
[17]  D. D. Blakeley, E. Z. Oddone, V. Hasselblad, D. L. Simel, and D. B. Matchar, “Noninvasive carotid artery testing. A meta-analytic review,” Annals of Internal Medicine, vol. 122, no. 5, pp. 360–367, 1995.
[18]  B. W. Matthews, “Comparison of the predicted and observed secondary structure of T4 phage lysozyme,” Biochimica et Biophysica Acta, vol. 405, no. 2, pp. 442–451, 1975.
[19]  T. Fawcett, “An introduction to ROC analysis,” Pattern Recognition Letters, vol. 27, no. 8, pp. 861–874, 2006.
[20]  J. A. Swets, “Measuring the accuracy of diagnostic systems,” Science, vol. 240, no. 4857, pp. 1285–1293, 1988.
[21]  P. Baldi, S. Brunak, Y. Chauvin, C. A. F. Andersen, and H. Nielsen, “Assessing the accuracy of prediction algorithms for classification: an overview,” Bioinformatics, vol. 16, no. 5, pp. 412–424, 2000.
[22]  B. Rost and C. Sander, “Prediction of protein secondary structure at better than 70% accuracy,” Journal of Molecular Biology, vol. 232, no. 2, pp. 584–599, 1993.
[23]  K. H. Brodersen, C. S. Ong, K. E. Stephan, and J. M. Buhmann, “The balanced accuracy and its posterior distribution,” in Proceedings of the 20th International Conference on Pattern Recognition (ICPR'10), pp. 3121–3124, Istanbul, Turkey, August 2010.
[24]  A. Frank and A. Asuncion, “UCI machine learning repository,” 2010, http://archive.ics.uci.edu/ml/.
[25]  S. S. Stevens, “On the theory of scales of measurement,” Science, vol. 103, no. 2684, pp. 677–680, 1946.
[26]  C. J. van Rijsbergen, “Information Retrieval,” 1979, http://www.dcs.gla.ac.uk/Keith/Preface.html.
[27]  C. W. Cleverdon, “The critical appraisal of information retrieval systems,” 1968, http://hdl.handle.net/1826/1366.
[28]  E. O. Cannon, A. Bender, D. S. Palmer, and J. B. O. Mitchell, “Chemoinformatics-based classification of prohibited substances employed for doping in sport,” Journal of Chemical Information and Modeling, vol. 46, no. 6, pp. 2369–2380, 2006.
[29]  O. Carugo, “Detailed estimation of bioinformatics prediction reliability through the fragmented prediction performance plots,” BMC Bioinformatics, vol. 8, article 380, 2007.
[30]  P. Chatterjee, S. Basu, M. Kundu, M. Nasipuri, and D. Plewczynski, “PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines,” Journal of Molecular Modeling, vol. 17, no. 9, pp. 2191–2201, 2011.
[31]  P. Dao, K. Wang, C. Collins, M. Ester, A. Lapuk, and S. C. Sahinalp, “Optimally discriminative subnetwork markers predict response to chemotherapy,” Bioinformatics, vol. 27, no. 13, pp. i205–i213, 2011.
[32]  K. K. Kandaswamy, K. C. Chou, T. Martinetz et al., “AFP-Pred: a random forest approach for predicting antifreeze proteins from sequence-derived properties,” Journal of Theoretical Biology, vol. 270, no. 1, pp. 56–62, 2011.
[33]  T. Y. Lee, C. T. Lu, S. A. Chen et al., “Investigation and identification of protein-glutamyl carboxylation sites,” in Proceedings of the 10th International Conference on Bioinformatics. 1st ISCB Asia Joint Conference 2011: Bioinformatics, 2011.
[34]  G. Mirceva, A. Naumoski, and D. Davcev, “A novel fuzzy decision tree based method for detecting protein active sites,” Advances in Intelligent and Soft Computing, vol. 150, pp. 51–60, 2012.
[35]  M. S. Cline, K. Karplus, R. H. Lathrop, T. F. Smith, R. G. Rogers, and D. Haussler, “Information-theoretic dissection of pairwise contact potentials,” Proteins, vol. 49, no. 1, pp. 7–14, 2002.
[36]  C. Kauffman and G. Karypis, “An analysis of information content present in protein-DNA interactions,” Pacific Symposium on Biocomputing, pp. 477–488, 2008.
[37]  M. Kulharia, R. S. Goody, and R. M. Jackson, “Information theory-based scoring function for the structure-based prediction of protein-ligand binding affinity,” Journal of Chemical Information and Modeling, vol. 48, no. 10, pp. 1990–1998, 2008.
[38]  T. J. Magliery and L. Regan, “Sequence variation in ligand binding sites in proteins,” BMC Bioinformatics, vol. 6, article 240, 2005.
[39]  C. S. Miller and D. Eisenberg, “Using inferred residue contacts to distinguish between correct and incorrect protein models,” Bioinformatics, vol. 24, no. 14, pp. 1575–1582, 2008.
[40]  O. G. Othersen, A. G. Stefani, J. B. Huber, and H. Sticht, “Application of information theory to feature selection in protein docking,” Journal of Molecular Modeling, vol. 18, no. 4, pp. 1285–1297, 2012.
[41]  A. D. Solis and S. Rackovsky, “Information and discrimination in pairwise contact potentials,” Proteins, vol. 71, no. 3, pp. 1071–1087, 2008.
[42]  B. Sterner, R. Singh, and B. Berger, “Predicting and annotating catalytic residues: an information theoretic approach,” Journal of Computational Biology, vol. 14, no. 8, pp. 1058–1073, 2007.
[43]  A. M. Wassermann, B. Nisius, M. Vogt, and J. Bajorath, “Identification of descriptors capturing compound class-specific features by mutual information analysis,” Journal of Chemical Information and Modeling, vol. 50, no. 11, pp. 1935–1940, 2010.
[44]  J. Francois, H. Abdelnur, R. State, and O. Festor, “Ptf: passive temporal fingerprinting,” in Proceedings of the 12th IFIP/IEEE International Symposium on Integrated Network Management, pp. 289–296, Dublin, UK, 2011.
[45]  T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley Series in Telecommunications, John Wiley & Sons, New York, NY, USA, 1991.
[46]  R. W. Yeung, A First Course in Information Theory. Information Technology: Transmission, Processing and Storage, Kluwer Academic, New York, NY, USA, 2002.
[47]  J. A. Swets, “Form of empirical ROCs in discrimination and diagnostic tasks. Implications for theory and measurement of performance,” Psychological Bulletin, vol. 99, no. 2, pp. 181–198, 1986.
[48]  J. A. Swets, “Indices of discrimination or diagnostic accuracy. Their ROCs and implied models,” Psychological Bulletin, vol. 99, no. 1, pp. 100–117, 1986.
[49]  D. Johnson, “Performance evaluation,” 2003, http://cnx.org/content/m11274/1.3/content_info.
[50]  J. M. Lobo, A. Jiménez-valverde, and R. Real, “AUC: a misleading measure of the performance of predictive distribution models,” Global Ecology and Biogeography, vol. 17, no. 2, pp. 145–151, 2008.
[51]  D. J. Hand, “Measuring classifier performance: a coherent alternative to the area under the ROC curve,” Machine Learning, vol. 77, no. 1, pp. 103–123, 2009.
[52]  S. Vanderlooy and E. Hüllermeier, “A critical analysis of variants of the AUC,” Machine Learning, vol. 72, no. 3, pp. 247–262, 2008.
[53]  M. Majnik and Z. Bosnic, “ROC analysis of classifers in machine learning: survey,” Tech. Rep. MM-1/2011, Faculty of Computer and Information Science, University of Ljubljana, 2011.
[54]  K. Nishimura, D. Sugiyama, Y. Kogata et al., “Meta-analysis: diagnostic accuracy of anti-cyclic citrullinated peptide antibody and rheumatoid factor for rheumatoid arthritis,” Annals of Internal Medicine, vol. 146, no. 11, pp. 797–808, 2007.
[55]  M. Schonlau, W. DuMouchel, W. H. Ju, A. F. Karr, M. Theus, and Y. Vardi, “Computer intrusion: detecting masquerades,” Statistical Science, vol. 16, no. 1, pp. 58–74, 2001.
[56]  A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki, “The DET curve in assessment of detection task performance,” in Proceedings of the 5th European Conference on Speech Communication and Technology, pp. 1895–1898, Rhodes, Greece, 1997.
[57]  I. K. Crombie and H. T. Davies, “What is meta-analysis?. ‘What is ... ?’,” series NPR09/1112, Hayward Medical Communications, 2009.


comments powered by Disqus