Saturday, January 26, 2013

Using Assessments in Teaching



Formative Assessment Validity
A trendy and popular movement in education, the professional learning community (PLC) may revolutionize teaching practices. In a nutshell, the movement professionalizes the practice of teaching by forcing teachers to rigorously analyze their practice by using a powerful tool--collaborative examination of assessment data. The data, properly collected and presented, shows how much teaching and learning value the teacher adds to a student, and, more importantly, the strengths and weaknesses of the teacher’s practice against specific state standards. Therefore, the teacher knows what she must do to improve and knows where to find assistance, by collaborating with another member of the department that has more success teaching that particular standard. After reading Nassim Taleb’s Fooled by Randomness, I am less confident in the power of comparing small samples of assessment data. When we compare larger, “normal” and therefore more randomly selected student samples, I feel more confident that the differences between teacher outcomes are statistically valid. Certainly, teaching technique must be analyzed somehow, and I am more comfortable examining the teacher’s method itself (which is what this writing is attempting to do) and less comfortable relying only on outcomes.

Looking closer at assessments, the linchpin of the PLC process, teachers have many different types of formative and cumulative at their disposal. If a department chooses to improve assessments for PLC work, the assessments must be common to the department at minimum, and later common to the entire district. In order, easiest to hardest to grade, teachers may use for their assessments multiple choice, fill-in, short answer or short essay questions. Longer essays force students to recall and not just identify the answer, and also compel students to critically analyze and synthesize data. Thus long essays may enjoy the highest validity of all assessments, but they are also the most difficult and time consuming to grade. By developing a detailed rubric and training teachers in using the rubric, teachers can work together to avoid reliability issues that usually mar grading across different assessors. Multiple choice tests require no rubric and the reliability between teachers is perfect—all teachers use the same key. However, since multiple choice tests often show ability to identify concepts rather than to work with them in any useful practical way, I have doubts about their validity in judging the “whole” student. I believe that the AP Board assesses students using both multiple choice and essays for good reason. As a one-time AP exam reader, I can vouch for the excellent rubrics and training readers received using those rubrics when grading essays. (See Figure  below for looking at the tradeoff between assessment validity and cross-teacher reliability.)



Reliability Versus Validity of Assessments

Validity of Assessment
High
                                                            D                     E
                                                                        C

                                                                                                B          A        
                                                                                   


 
Low                                                                 High
Cross-Teacher Reliability

A: multiple choice assessments
B: fill-in assessments
C: short answer assessments
D: essays without rubric
E: essays with rubric

Point E, the essays with rubric, combines the highest level of validity with a moderately high level of cross-teacher reliability. Unfortunately, essays take the longest to grade and since the department must create a rubric, demand much preparation time. Teachers consider time a precious resource as well as validity and reliability. (See figure below.) Multiple choice tests are popular since they are graded quicker than any other assessment—you just run them through the scantron machine, and if you are as quick as I, at 20 per minute.

Time Needed for Grading

Time Needed for
Preparation of Assessment                                  E
High
                                                A                                
                                                    B     
                                                            C
                                                                                    D                                
                                                                                   

Low                                                                 High
Time needed for grading assessment


A: multiple choice assessments
B: fill-in assessments
C: short answer assessments
D: essays without rubric
E: essays with rubric

However, multiple choice questions take a lot of time to compose. Teachers are not trained to create multiple wrong answers! The short answer test may offer a reasonable compromise between test creation and time to grade and take the least amount of time when both test composition and grading time are taken into account. Teachers may prefer one formative assessment over another based on both time constraints and the level of validity needed. After administering the assessment, teachers may chose to re-teach a lesson or make other adjustments to future lessons to increase student learning. Successful teachers, in the spirit of the PLC movement, work with their colleagues to find successful strategies for re-teaching.

Informal Formative Assessments
Teachers that include a formative assessment, and therefore check for understanding at some point in every lesson plan, will ensure that more students learn. These assessments can be created by the seat of one’s pants, without preparation, and are usually labeled as “informal” assessments. These may include a “ticket out of class;” asking a specific student the answer to a key concept, and then (if that fails) getting the student to partner up or asking everyone in the class; splitting the class arbitrarily in half and debating a controversial point; and engaging in games. My favorite three games, baseball, jeopardy, and flyswatter (matamoscas) can be created quickly when needed and are so enjoyable that the students ask to play them if there are a few minutes left in the period.

  • Baseball (also mentioned above in my [students’] favorite things): allows students to pick a question that is easy or difficult. Students ask for a “single” (easiest), “double” (medium difficulty), “triple” (hard) or “home run” (very difficult). When a student gives a correct response to the question, she goes to one of the four “bases” I have set up around the room. If she gets the answer wrong, she, the batter, is “out” and her team continues with a new batter. If it’s the third out, the other team bats. Each team scores runs when hitters are batted in by others or home runs. Rules for baseball are only loosely enforced, always in a way that allows me to emphasize material they ought to know. Unprepared or lazy students may ask for a home run question to avoid showing the class how little they actually know. When that happens, reserve home run questions for high achieving students or make a miss of a home run question worth two outs. I recycle home run questions, using them for a “double” or “triple” question later since we have lost the novelty of the question, and another student may have looked up the answer to the question. I can play a few innings of baseball in fifteen minutes.
  • Jeopardy: demands more teacher preparation than baseball. In my advanced classes I often assign students to come up with answers needed to run the game. These answers must be categorized by topic and by level of difficulty. Harder answers are worth more points. Try to run the game similar to the television show. The winning group can win a nominal prize such as a picture taken with a digital camera.
  • Flyswatter (matamoscas): answers must be set up on the board before class, either by the teacher or by advanced students. Typically the composer creates 20 one- or two-word answers that match questions the teacher will give to the contestants. Two contestants compete. Each contestant has his back turned to the board (and the answers), turning towards the board, rolled up newspaper (or flyswatter) in hand. The first contestant that swats at the correct answer on the board wins that round and faces a new contestant. Both Jeopardy and flyswatter require mere identification of the answer. Teachers can compose questions that require higher order thinking, a la Bloom’s Taxonomy, for baseball.
  • Interactive lecture: resembles a conversation on two levels. First, the teacher converses with herself, paraphrasing actual history. What might FDR have said to officials in the State Department right before meeting Stalin at Yalta? (Play it out.) What was Reagan’s conversation with Gorbachev as they tried to hammer out an arms control agreement? (Play it out.) Second, the teacher asks questions to students. Why did FDR need Soviet help at the end of WWII? How did Reagan get Gorbachev to go along on arms control? An interactive lecture presents material in a more entertaining fashion and checks immediately for understanding.

The informal formative assessments above and formal formative assessments fulfill two objectives: they show if the students learned the material and, since it’s usually necessary, gave an opportunity to review the material in an enjoyable way. Occasionally, I let students know that I will be using the same questions they encountered on an informal assessment on a formal assessment. Since the formal assessment counts for a grade, I have now increased interest in learning the material. What if only a few kids have not learned the material? Should the class be held back for the benefit of a few? Perhaps the top students need to work with the laggards. I prefer that slower learners get the benefit of some remedial opportunity built into the school structure, such as a study hall or mandatory tutoring opportunity. PLC writers have demonstrated how this can be done. (See figure.)

Remedial Opportunities when Students Don’t Learn

Informal formative assessment                                       Teacher reviews material         


Students learn
Students do not learn                            Remedial Opportunity


For this model to work, teachers need to have enough slack time in their curriculum mapping for a small amount of re-teaching time, but the greatest responsibility for re-teaching falls upon the system already in place such as a school day study hall.


Homework as Formative Assessment
I use homework primarily (though not exclusively) as a formative assessment—to find out what the kids do and do not know. Since I use homework as an instrument to determine where teaching should go instead of an opportunity to engage in new learning, I keep the assignments short and infrequent. I prefer the bulk of new learning to take place in class where I am available to help with understanding of concepts, vocabulary and writing technique. The education literature does not show a strong correlation between the amount of homework assigned and student performance, especially with at-risk students. Therefore, when I do assign homework I use it primarily to give me feedback and not as an opportunity to grade.

Summative Assessments
Summative assessments do not enable much learning to take place—since after the summative there is no second chance—but, ironically, these assessments are the ones that give both teachers and students headaches and ulcers. Final exams, standardized tests such as the California High School Exit Exam (CAHSEE), CST in California (AKA the STAR test), ACT, SAT, and AP exams, and music auditions put both teacher and student under a microscope. These high stakes tests determine who graduates, who goes to a four-year school, and under NCLB, whether a school that receives federal funding can continue under its current leadership and teaching staff. Since the feds keep raising the bar, increasing numbers of schools have become low performing schools. If the government makes no changes to the process, in a few years all schools will be low performing schools, and, if they receive federal funding, will be subject to corrective action.

Summative assessments can be gut-wrenching experiences for both teacher and student, but we can also make these assessments interesting learning experiences. A final exam can contain an essay prompt that asks for synthesis—asking the student to retell or put facts together in a creative way, using critical thinking. Examples could include questions like the following.
  • US History: Show parallels between the 1968 and 2008 Democratic Convention.
  • Psychology: Evaluate this clinical vignette and write up this patient under Axes I and II.
    • Describe how your membership in groups and your beliefs about the meaning of life have made you who you are.
  • World History: Explain why the French and American Revolutions progressed differently.
  • Economics: show both theoretically and practically why Keynesian economic policies will (or won’t) work and give historical examples from the crash of 2008 that back up your opinion.

How Teachers Benefit from Summative Assessments
By analyzing state testing results through the number crunching software of Data Director or by hand, teachers can examine the correlation of semester grades with the state test scores. When a teacher discovers a gross discrepancy, such as top test scores but a class grade of C, he needs to explore the reasons why. (See figure.)

Standardized Testing Results Versus Classroom Performance for the Individual Student

Standardized Testing Results


High
Quadrant I
Ideal
Quadrant II
Unmotivated but bright student or test results may have been altered. Classroom assessments may have little predictive validity.
Low
Quadrant III
Student may have “bubbled” test, randomly picking answers or student may have cheated his way through the class. Schools must make standardized tests meaningful and important. Teachers must defend against cheating.
Quadrant IV
Expected valid score. Schools must intervene to help low-achieving students learn.
Classroom Performance
High
Low

Teachers expect students to fall within quadrants I and IV. Gifted students usually do well on standardized tests and earn high marks in their classes. Poor students usually do poorly in both testing and grading. Problems arise when students fall into the other quadrants.

What to do right now: work with your department and come up with a plan to better the performance of students in quadrants II and III. Why does your quadrant II student refuse to work at his potential? How can I arrange my assessments to prevent the quadrant III student from cheating or how can I get her to take the standardized tests more seriously?

No comments:

Post a Comment

Teacher by Day, Drummer by Night

Teacher by Day, Drummer by Night
Please recommend this blog to others

Popular Posts