There is a consensus that serious games have a significant potential as a tool for instruction. However, their effectiveness in terms of learning outcomes is still understudied mainly due to the complexity involved in assessing intangible measures. A systematic approach—based on established principles and guidelines—is necessary to enhance the design of serious games, and many studies lack a rigorous assessment. An important aspect in the evaluation of serious games, like other educational tools, is user performance assessment. This is an important area of exploration because serious games are intended to evaluate the learning progress as well as the outcomes. This also emphasizes the importance of providing appropriate feedback to the player. Moreover, performance assessment enables adaptivity and personalization to meet individual needs in various aspects, such as learning styles, information provision rates, feedback, and so forth. This paper first reviews related literature regarding the educational effectiveness of serious games. It then discusses how to assess the learning impact of serious games and methods for competence and skill assessment. Finally, it suggests two major directions for future research: characterization of the player’s activity and better integration of assessment in games. 1. Introduction Serious games are designed to have an impact on the target audience, which is beyond the pure entertainment aspect [1, 2]. One of the most important application domains is in the field of education given the acknowledged potential of serious games to meet the current need for educational enhancement [3, 4]. In this field, the purpose of a serious game is twofold: (i) to be fun and entertaining, and (ii) to be educational. A serious game is thus designed both to be attractive and appealing to a broad target audience, similar to commercial games, and to meet specific educational goals as well. Therefore, assessment of a serious game must consider both aspects of fun/enjoyment and educational impact. In addition to considering fun and engagement, thus, serious games’ assessment presents additional unique challenges, because learning is the primary goal. Therefore, there is also a need to explore how to evaluate the learning outcomes to identify which serious games are most suited for a given goal or domain, and how to design more effective serious games (e.g., what mechanics are most suited for a given pedagogical goal, etc.). In this sense, the evaluation of serious games should also cover player performance assessment. Performance assessment
F. L. Greitzer, O. A. Kuchar, and K. Huston, “Cognitive science implications for enhancing training effectiveness in a serious gaming context,” ACM Journal on Educational Resources in Computing, vol. 7, no. 3, article 2, 2007.
F. De Grove, P. Mechant, and J. Van Looy, “Uncharted waters? Exploring experts' opinions on the opportunities and limitations of serious games for foreign language learning,” in Proceedings of the 3rd International Conference on Fun and Games, pp. 107–115, Leuven, Belgium, September 2010.
V. Shute, M. Ventura, M. Bauer, and D. Zapata-Rivera, “Melding the power of serious games and embedded assessment to monitor and foster learning: flow and grow,” in Serious Games: Mechanisms and Effects, U. Ritterfeld, M. Cody, and P. Vorderer, Eds., pp. 295–321, Routledge, Taylor and Francis, Mahwah, NJ, USA, 2009.
A. A. Kulik, “School mathematics and science programs benefit from instructional technology,” United States National Science Foundation (NSF), National Center for Science and Engineering Statistics (NCSES), InfroBrief NSF-03-301, November 2002, http://www.nsf.gov/statistics/infbrief/nsf03301/.
S. Livingston, G. Fennessey, J. Coleman, K. Edwards, and S. Kidder, “The Hopkins games program: final report on seven years of research,” Report No. 155, Johns Hopkins University, Center for Social Organization of Schools, Baltimore, Md, USA, 1973.
T. M. Connolly, E. A. Boyle, E. MacArthur, T. Hainey, and J. M. Boyle, “A systematic literature review of the empirical evidence on computer games and serious games,” Computers and Education, vol. 59, no. 2, pp. 661–686, 2012.
P. M. Kato, S. W. Cole, A. S. Bradlyn, and B. H. Pollock, “A video game improves behavioral outcomes in adolescents and young adults with cancer: a randomized trial,” Pediatrics, vol. 122, no. 2, pp. e305–e317, 2008.
S. D. Dandeneau and M. W. Baldwin, “The inhibition of socially rejecting information among people with high versus low self-esteem: the role of attentional bias and the effects of bias reduction training,” Journal of Social and Clinical Psychology, vol. 23, no. 4, pp. 584–602, 2004.
F. Bellotti, R. Berta, and A. De Gloria, “Designing effective serious games: opportunities and challenges for research,” International Journal of Emerging Technologies in Learning, vol. 5, pp. 22–35, 2010.
G. Bente and J. Breuer, “Making the implicit explicit: embedded measurement in serious games,” in Serious Games: Mechanisms and Effects, U. Ritterfield, M. J. Cody, and P. Vorderer, Eds., pp. 322–343, Routledge, New York, NY, USA, 2009.
C. Sebastian, A. Anantachai, J. H. Byun, and J. Lenox, “Assessing what players learned in serious games: in-situ data collection, information trails, and quantitative analysis,” in Proceedings of the 10th International Conference on Computer Games: AI, Animation, Mobile, Educational and Serious Games, pp. 10–19, 2007.
L. Allen, M. Seeney, L. Boyle, and F. Hancock, “The implementation of team based assessment in serious games,” in Proceedings of the 1st Conference in Games and Virtual Worlds for Serious Applications (VS-GAMES '09), pp. 28–35, Coventry, UK, March 2009.
J. H. Brockmyer, C. M. Fox, K. A. Curtiss, E. McBroom, K. M. Burkhart, and J. N. Pidruzny, “The development of the Game Engagement Questionnaire: a measure of engagement in video game-playing,” Journal of Experimental Social Psychology, vol. 45, no. 4, pp. 624–634, 2009.
S. H. Janicke and A. Ellis, “Psychological and physiological differences between the 3D and 2D gaming experience,” in Proceedings of the 3D Entertainment Summit, Hollywood, Calif, USA, September, 2011.
H. F. Jelinek, K. August, H. Imam, A. H. Khandoker, A. Koenig, and R. Riener, “Heart rate asymmetry and emotional response to robot assist task challenges in stroke patients,” in Proceedings of the Computing in Cardiology Conference, Hangzhou, China, September 2011.
A. Plotnikov, N. Stakheika, A. De Gloria et al., “Exploiting real-time EEG analysis for assessing flow in games,” in Workshop: “Game Based Learning for 21st Century Transferable Skills”, at iCalt 2012, Rome, Italy, June 2012.
L. E. Nacke, “Physiological game interaction and psychophysiological evaluation in research and industry,” Gamasutra Article, June 2011, http://www.gamasutra.com/blogs/LennartNacke/20110628/7867/Physiological_Game_Interaction_and_Psychophysiological_Evaluation_in_Research_and_Industry.php.
C. Loh, “Designing online games assessment as information trails,” in Games and Simulations in Online Learning: Research and Development Frameworks, D. Gibson, C. Aldrich, and M. Prensky, Eds., pp. 323–348, Information Science Publishing, Hershey, Pa, USA, 2007.
G. N. Yannakakis and J. Hallam, “Evolving opponents for interesting interactive computer games,” in Proceedings of the International Conference on Computer Games: Artificial Intelligence, Design and Education, 2004.
H. Iida, N. Takeshita, and J. Yoshimura, “A metric for entertainment of boardgames: its implication for evolution of chess variants,” in Proceeding of: Entertainment Computing: Technologies and Applications, IFIP First International Workshop on Entertainment Computing (IWEC '02), R. Nakatsu and J. Hoshino, Eds., pp. 65–72, Kluwer Academic, Boston, Mass, USA, 2003.
M. Hassenzahl and R. Wessler, “Capturing design space from a user perspective: the repertory grid technique revisited,” International Journal of Human-Computer Interaction, vol. 12, no. 3-4, pp. 441–459, 2000.
C. Hewson, “Can online course-based assessment methods be fair and equitable? Relationships between students' preferences and performance within online and offline assessments,” Journal of Computer Assisted Learning, vol. 28, no. 5, pp. 488–498, 2001.
J. Hattie, G. Brown, P. Keegan et al., “Validation evidence of asTTle reading assessment results: norms and criteria,” Asttle Tech. Rep. 22, University of Auckland/Ministry of Education, November 2003.
J. Hattie, “Large-scale assessment of student competencies,” in Symposium: Working in Today's World of Testing and Measurement: Required Knowledge and Skills (Joint ITC/CPTA Symposium); the 26th International Congress of Applied Psychology, Athens, Greece, July 2006.
J. Bull and D. Stephens, “The use of question mark software for formative and summative assessment in two universities,” Innovations in Education and Teaching International, vol. 36, no. 2, pp. 128–135, 1999.
HEFCE JISC, “Case study 5: making the most of a computer-assisted assessment system University of Manchester,” 2010, http://www.jisc.ac.uk/media/documents/programmes/elearning/digiassess_makingthemost.pdf.
S. Jordan and T. Mitchell, “e-Assessment for learning? The potential of short-answer free-text questions with tailored feedback,” British Journal of Educational Technology, vol. 40, no. 2, pp. 371–385, 2009.
I. D. Beatty and W. J. Gerace, “Technology-enhanced formative assessment: a research-based pedagogy for teaching science with classroom response technology,” Journal of Science Education and Technology, vol. 18, no. 2, pp. 146–162, 2009.
L. B. Resnick and D. P. Resnick, Assessing the Thinking Curriculum: New Tools for Educational Reform, Learning Research and Development Center: University of Pittsburgh and Carnegie Mellon University, Pittsburgh, Pa, USA, 1989.
M. Lipman, “Some thoughts on the formation of reflective education,” in Teaching-Thinking Skills: Theory and Practice, J. B. Baron and R. J. Sternberg, Eds., pp. 151–161, W. H. Freeman, New York, NY, USA, 1987.
C. Tribune, “Standardized testing will limit students' future,” April 2010, http://articles.chicagotribune.com/2010-04-21/news/chi-100421shafer_briefs_1_standardized-test-scores-teacher-and-principal-evaluations.
E. J. Short, M. Noeder, S. Gorovoy, M. J. Manos, and B. Lewis, “The importance of play in both the assessment and treatment of young children,” in An Evidence-Based Approach to Play in Intervention and Prevention: Integrating Developmental and Clinical Science, S. Russ and L. Niec, Eds., Guilford, London, UK.
A. S. Kaugars and S. W. Russ, “Assessing preschool children's pretend play: preliminary validation of the affect in play scale-preschool version,” Early Education and Development, vol. 20, no. 5, pp. 733–755, 2009.
F. Bellotti, R. Berta, A. De Gloria, and L. Primavera, “Adaptive experience engine for serious games,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 1, no. 4, pp. 264–280, 2009.
L. Doucet and V. Srinivasany, “Designing entertaining educational games using procedural rhetoric: a case study,” in Proceedings of the 5th ACM SIGGRAPH Symposium on Video Games, pp. 5–10, Los Angeles, Calif, USA, July 2010.
J. Froschauer, I. Seidel, M. G？rtner, H. Berger, and D. Merkl, “Design and evaluation of a serious game for immersive cultural training,” in Proceedings of the 16th International Conference on Virtual Systems and Multimedia (VSMM '10), pp. 253–260, IEEE CS Press, Seoul, Republic of Korea, October 2010.
F. Bellotti, R. Berta, A. De Gloria, and L. Primavera, “A task annotation model for SandBox Serious Games,” in Proceedings of IEEE Symposium on Computational Intelligence and Games (CIG '09), pp. 233–240, Milano, Italy, September 2009.