Innovative testing refers to the use of novel methods to test students in richer ways than can be accomplished using traditional testing approaches. This generally means the use of technology like computers to deliver test questions that require students to watch or listen to multimedia stimuli, manipulate virtual objects in interactive situations, and/or construct rather than select their responses. The goal of innovative testing is to measure students’ knowledge and skills at deeper levels and measure constructs not easily assessed, such as problem solving, critical analysis, and collaboration. This will help us better understand what students have and haven’t learned, and what misconceptions they might hold—and thus support decisions such as those related to accountability as well as what instructional interventions might be appropriate for individual students.
Educational testing has always involved innovative approaches. As hidden properties of students, knowledge and skill are generally impossible to measure directly and very difficult to measure indirectly, often requiring the use of complex validity arguments. However, to the extent that newer technologies may allow us to more directly assess students’ knowledge and skills—by asking students to accomplish tasks that more faithfully represent the underlying constructs they’re designed to measure—innovative testing holds the promise of more authentic methods of testing based upon simpler validity arguments. And as such, measurement of constructs that are “higher up” on taxonomies of depth of understanding, such as Bloom’s and Webb’s, should be become more attainable.
Consider assessing a high school student’s ability to design an experimental study. Is this the same as his or her ability to identify one written description of a well-designed experiment amongst three written descriptions of poorly designed experiments? Certainly there will be a correlation between the two; the question is how good a correlation, or more bluntly, how artificial the context. And further, to what extent is such a correlation a self-fulfilled prophesy, such that students who might be good at thinking and doing science but not at narrowly defined assessment tasks are likely to do poorly in school as a result of poor test scores due to the compounding impact of negative feedback?
Many will recall the promise of performance assessment in the 90’s to test students more authentically. Performance testing didn’t live up to its potential, in large part because of the challenges of standardized administration and accurate scoring. Enter innovative questions—performance testing riding the back of digital technologies and new media. Richer assessment scenarios and opportunities for response can be administered equitably and at scale. Comprehensive student interaction data can be collected and scored by humans in efficient, distributed setting, automatically by computer, or both. In short, the opportunity for both large-scale and small-scale testing of students using tasks that more closely resemble real-world application of learning standards is now available.
Without question, creating innovative test questions presents additional challenges over that of simpler, traditional ones. As with any performance task, validity arguments become more complex and reliability of scoring becomes a larger concern. Fortunately, there has been some excellent initial work in the area of understanding how, including the development of taxonomies and rich descriptions for understanding innovative questions (e.g., Scalise; Zenisky). Most notable are two approaches that directly address validity. The first is evidence-centered design, an approach to creating educational assessments in terms of evidentiary arguments built upon intended constructs. The second is a preliminary set of guidelines for the appropriate use of technology in developing innovative questions through application of universal design principles that take into account how students interact with those questions as a function of their perceptual, linguistic, cognitive, motoric, executive, and affective skills and challenges. Approaches such as these are especially essential if we are to help ensure the needs of students with disabilities and English language learners are considered from the beginning in designing our tests.
Do we know that innovative questions will indeed allow us to test students to greater depths of knowledge and skill than traditional ones, and whether will they do so in a valid, reliable, and fair manner? And will the purported cost effectiveness be realized? These are all questions that need ongoing research.
As we solve the challenges of implementing innovative questions in technically sound ways, perhaps the most exciting aspect of innovative testing is the opportunity of integration with evolving innovative instructional approaches. Is this putting the cart before the horse to focus so much on innovation in assessment before we figure it out in instruction? I believe not. Improvements to instructional and assessment technologies must co-evolve. Our tests must be designed to pick up the types of learning gains our students will be making, especially when we consider 21st century skills, which will increasingly rely on innovative, technology-based learning tools. Plus our tests have a direct opportunity to impact instruction: despite all our efforts, “teaching to the test” will occur, so why not have those tests become models of good learning? And even if an emphasis on assessment is the cart, at least the whole jalopy is going the correct way down the road. Speaking of roads, consider the co-evolution of automobiles and the development of improved paving technologies: improvement in each couldn’t progress without improvement in the other.
Bob Dolan, Ph.D.
Senior Research Scientist
Educational testing has always involved innovative approaches. As hidden properties of students, knowledge and skill are generally impossible to measure directly and very difficult to measure indirectly, often requiring the use of complex validity arguments. However, to the extent that newer technologies may allow us to more directly assess students’ knowledge and skills—by asking students to accomplish tasks that more faithfully represent the underlying constructs they’re designed to measure—innovative testing holds the promise of more authentic methods of testing based upon simpler validity arguments. And as such, measurement of constructs that are “higher up” on taxonomies of depth of understanding, such as Bloom’s and Webb’s, should be become more attainable.
Consider assessing a high school student’s ability to design an experimental study. Is this the same as his or her ability to identify one written description of a well-designed experiment amongst three written descriptions of poorly designed experiments? Certainly there will be a correlation between the two; the question is how good a correlation, or more bluntly, how artificial the context. And further, to what extent is such a correlation a self-fulfilled prophesy, such that students who might be good at thinking and doing science but not at narrowly defined assessment tasks are likely to do poorly in school as a result of poor test scores due to the compounding impact of negative feedback?
Many will recall the promise of performance assessment in the 90’s to test students more authentically. Performance testing didn’t live up to its potential, in large part because of the challenges of standardized administration and accurate scoring. Enter innovative questions—performance testing riding the back of digital technologies and new media. Richer assessment scenarios and opportunities for response can be administered equitably and at scale. Comprehensive student interaction data can be collected and scored by humans in efficient, distributed setting, automatically by computer, or both. In short, the opportunity for both large-scale and small-scale testing of students using tasks that more closely resemble real-world application of learning standards is now available.
Without question, creating innovative test questions presents additional challenges over that of simpler, traditional ones. As with any performance task, validity arguments become more complex and reliability of scoring becomes a larger concern. Fortunately, there has been some excellent initial work in the area of understanding how, including the development of taxonomies and rich descriptions for understanding innovative questions (e.g., Scalise; Zenisky). Most notable are two approaches that directly address validity. The first is evidence-centered design, an approach to creating educational assessments in terms of evidentiary arguments built upon intended constructs. The second is a preliminary set of guidelines for the appropriate use of technology in developing innovative questions through application of universal design principles that take into account how students interact with those questions as a function of their perceptual, linguistic, cognitive, motoric, executive, and affective skills and challenges. Approaches such as these are especially essential if we are to help ensure the needs of students with disabilities and English language learners are considered from the beginning in designing our tests.
Do we know that innovative questions will indeed allow us to test students to greater depths of knowledge and skill than traditional ones, and whether will they do so in a valid, reliable, and fair manner? And will the purported cost effectiveness be realized? These are all questions that need ongoing research.
As we solve the challenges of implementing innovative questions in technically sound ways, perhaps the most exciting aspect of innovative testing is the opportunity of integration with evolving innovative instructional approaches. Is this putting the cart before the horse to focus so much on innovation in assessment before we figure it out in instruction? I believe not. Improvements to instructional and assessment technologies must co-evolve. Our tests must be designed to pick up the types of learning gains our students will be making, especially when we consider 21st century skills, which will increasingly rely on innovative, technology-based learning tools. Plus our tests have a direct opportunity to impact instruction: despite all our efforts, “teaching to the test” will occur, so why not have those tests become models of good learning? And even if an emphasis on assessment is the cart, at least the whole jalopy is going the correct way down the road. Speaking of roads, consider the co-evolution of automobiles and the development of improved paving technologies: improvement in each couldn’t progress without improvement in the other.
Bob Dolan, Ph.D.
Senior Research Scientist
No comments:
Post a Comment