Formative Assessment Validity
A trendy and popular movement in education, the professional
learning community (PLC) may revolutionize teaching practices. In a nutshell,
the movement professionalizes the practice of teaching by forcing teachers to
rigorously analyze their practice by using a powerful tool--collaborative
examination of assessment data. The data, properly collected and presented,
shows how much teaching and learning value the teacher adds to a student, and,
more importantly, the strengths and weaknesses of the teacher’s practice
against specific state standards. Therefore, the teacher knows what she must do
to improve and knows where to find assistance, by collaborating with another
member of the department that has more success teaching that particular
standard. After reading Nassim Taleb’s Fooled
by Randomness, I am less confident in the power of comparing small samples
of assessment data. When we compare larger, “normal” and therefore more
randomly selected student samples, I feel more
confident that the differences between teacher outcomes are statistically valid.
Certainly, teaching technique must be analyzed somehow, and I am more
comfortable examining the teacher’s method itself (which is what this writing
is attempting to do) and less comfortable relying only on outcomes.
Looking closer at assessments, the linchpin of the PLC
process, teachers have many different types of formative and cumulative at
their disposal. If a department chooses to improve assessments for PLC work,
the assessments must be common to the department at minimum, and later common
to the entire district. In order, easiest to hardest to grade, teachers may use
for their assessments multiple choice, fill-in, short answer or short essay
questions. Longer essays force students to recall and not just identify the answer,
and also compel students to critically analyze and synthesize data. Thus long
essays may enjoy the highest validity of all assessments, but they are also the
most difficult and time consuming to grade. By developing a detailed rubric and
training teachers in using the rubric, teachers can work together to avoid
reliability issues that usually mar grading across different assessors.
Multiple choice tests require no rubric and the reliability between teachers is
perfect—all teachers use the same key. However, since multiple choice tests
often show ability to identify concepts rather than to work with them in any useful
practical way, I have doubts about their validity in judging the “whole”
student. I believe that the AP Board assesses students using both multiple
choice and essays for good reason. As a one-time AP exam reader, I can vouch
for the excellent rubrics and training readers received using those rubrics
when grading essays. (See Figure below for looking at the tradeoff between
assessment validity and cross-teacher reliability.)
Reliability Versus Validity of Assessments
Validity of Assessment
High
D E
C
B A
Low High
Cross-Teacher Reliability
A: multiple choice assessments
B: fill-in assessments
C: short answer assessments
D: essays without rubric
E: essays with rubric
Point E, the essays with rubric, combines the highest level
of validity with a moderately high level of cross-teacher reliability.
Unfortunately, essays take the longest to grade and since the department must
create a rubric, demand much preparation time. Teachers consider time a precious
resource as well as validity and reliability. (See figure below.) Multiple
choice tests are popular since they are graded quicker than any other
assessment—you just run them through the scantron machine, and if you are as
quick as I, at 20 per minute.
Time Needed for Grading
Time
Needed for
Preparation of Assessment E
High
A
B
C
D
Low High
Time needed
for grading assessment
A: multiple choice assessments
B: fill-in assessments
C: short answer assessments
D: essays without rubric
E: essays with rubric
However, multiple choice questions take a lot of time to
compose. Teachers are not trained to create multiple wrong answers! The short
answer test may offer a reasonable compromise between test creation and time to
grade and take the least amount of time when both test composition and grading
time are taken into account. Teachers may prefer one formative assessment over
another based on both time constraints and the level of validity needed. After
administering the assessment, teachers may chose to re-teach a lesson or make
other adjustments to future lessons to increase student learning. Successful
teachers, in the spirit of the PLC movement, work with their colleagues to find
successful strategies for re-teaching.
Informal Formative Assessments
Teachers that include a formative assessment, and therefore
check for understanding at some point in every lesson plan, will ensure that
more students learn. These assessments can be created by the seat of one’s
pants, without preparation, and are usually labeled as “informal” assessments.
These may include a “ticket out of class;” asking a specific student the answer
to a key concept, and then (if that fails) getting the student to partner up or
asking everyone in the class; splitting the class arbitrarily in half and
debating a controversial point; and engaging in games. My favorite three games,
baseball, jeopardy, and flyswatter (matamoscas) can be created quickly when
needed and are so enjoyable that the students ask to play them if there are a
few minutes left in the period.
- Baseball (also mentioned above in my [students’] favorite things): allows students to pick a question that is easy or difficult. Students ask for a “single” (easiest), “double” (medium difficulty), “triple” (hard) or “home run” (very difficult). When a student gives a correct response to the question, she goes to one of the four “bases” I have set up around the room. If she gets the answer wrong, she, the batter, is “out” and her team continues with a new batter. If it’s the third out, the other team bats. Each team scores runs when hitters are batted in by others or home runs. Rules for baseball are only loosely enforced, always in a way that allows me to emphasize material they ought to know. Unprepared or lazy students may ask for a home run question to avoid showing the class how little they actually know. When that happens, reserve home run questions for high achieving students or make a miss of a home run question worth two outs. I recycle home run questions, using them for a “double” or “triple” question later since we have lost the novelty of the question, and another student may have looked up the answer to the question. I can play a few innings of baseball in fifteen minutes.
- Jeopardy: demands more teacher preparation than baseball. In my advanced classes I often assign students to come up with answers needed to run the game. These answers must be categorized by topic and by level of difficulty. Harder answers are worth more points. Try to run the game similar to the television show. The winning group can win a nominal prize such as a picture taken with a digital camera.
- Flyswatter (matamoscas): answers must be set up on the board before class, either by the teacher or by advanced students. Typically the composer creates 20 one- or two-word answers that match questions the teacher will give to the contestants. Two contestants compete. Each contestant has his back turned to the board (and the answers), turning towards the board, rolled up newspaper (or flyswatter) in hand. The first contestant that swats at the correct answer on the board wins that round and faces a new contestant. Both Jeopardy and flyswatter require mere identification of the answer. Teachers can compose questions that require higher order thinking, a la Bloom’s Taxonomy, for baseball.
- Interactive lecture: resembles a conversation on two levels. First, the teacher converses with herself, paraphrasing actual history. What might FDR have said to officials in the State Department right before meeting Stalin at Yalta? (Play it out.) What was Reagan’s conversation with Gorbachev as they tried to hammer out an arms control agreement? (Play it out.) Second, the teacher asks questions to students. Why did FDR need Soviet help at the end of WWII? How did Reagan get Gorbachev to go along on arms control? An interactive lecture presents material in a more entertaining fashion and checks immediately for understanding.
The informal formative assessments above and formal formative
assessments fulfill two objectives: they show if the students learned the
material and, since it’s usually necessary, gave an opportunity to review the
material in an enjoyable way. Occasionally, I let students know that I will be
using the same questions they encountered on an informal assessment on a formal
assessment. Since the formal assessment counts for a grade, I have now
increased interest in learning the material. What if only a few kids have not
learned the material? Should the class be held back for the benefit of a few?
Perhaps the top students need to work with the laggards. I prefer that slower
learners get the benefit of some remedial opportunity built into the school
structure, such as a study hall or mandatory tutoring opportunity. PLC writers
have demonstrated how this can be done. (See figure.)
Remedial Opportunities when Students Don’t Learn
Informal formative assessment Teacher
reviews material
Students learn
Students do not learn Remedial Opportunity
For this model to work, teachers need to have enough slack
time in their curriculum mapping for a small amount of re-teaching time, but
the greatest responsibility for re-teaching falls upon the system already in
place such as a school day study hall.
Homework as Formative Assessment
I use homework primarily (though not exclusively) as a
formative assessment—to find out what the kids do and do not know. Since I use
homework as an instrument to determine where teaching should go instead of an
opportunity to engage in new learning, I keep the assignments short and
infrequent. I prefer the bulk of new learning to take place in class where I am
available to help with understanding of concepts, vocabulary and writing technique.
The education literature does not show a strong correlation between the amount
of homework assigned and student performance, especially with at-risk students.
Therefore, when I do assign homework I use it primarily to give me feedback and
not as an opportunity to grade.
Summative Assessments
Summative assessments do not enable much learning to take
place—since after the summative there is no second chance—but, ironically,
these assessments are the ones that give both teachers and students headaches
and ulcers. Final exams, standardized tests such as the California High School
Exit Exam (CAHSEE), CST in California
(AKA the STAR test), ACT, SAT, and AP exams, and music auditions put both
teacher and student under a microscope. These high stakes tests determine who
graduates, who goes to a four-year school, and under NCLB, whether a school
that receives federal funding can continue under its current leadership and
teaching staff. Since the feds keep raising the bar, increasing numbers of
schools have become low performing schools. If the government makes no changes
to the process, in a few years all schools will be low performing schools, and,
if they receive federal funding, will be subject to corrective action.
Summative assessments can be gut-wrenching experiences for
both teacher and student, but we can also make these assessments interesting
learning experiences. A final exam can contain an essay prompt that asks for
synthesis—asking the student to retell or put facts together in a creative way,
using critical thinking. Examples could include questions like the following.
- US History: Show parallels between the 1968 and 2008 Democratic Convention.
- Psychology: Evaluate this clinical vignette and write up this patient under Axes I and II.
- Describe how your membership in groups and your beliefs about the meaning of life have made you who you are.
- World History: Explain why the French and American Revolutions progressed differently.
- Economics: show both theoretically and practically why Keynesian economic policies will (or won’t) work and give historical examples from the crash of 2008 that back up your opinion.
How Teachers Benefit
from Summative Assessments
By analyzing state testing results through the number
crunching software of Data Director or by hand, teachers can examine the
correlation of semester grades with the state test scores. When a teacher
discovers a gross discrepancy, such as top test scores but a class grade of C, he
needs to explore the reasons why. (See figure.)
Standardized Testing Results Versus Classroom Performance
for the Individual Student
Standardized Testing
Results
|
||
High
|
Quadrant I
Ideal
|
Quadrant II
Unmotivated but bright student or test results may have
been altered. Classroom assessments may have little predictive validity.
|
Low
|
Quadrant III
Student may have “bubbled” test, randomly picking answers
or student may have cheated his way through the class. Schools must make
standardized tests meaningful and important. Teachers must defend against
cheating.
|
Quadrant IV
Expected valid score. Schools must intervene to help
low-achieving students learn.
|
Classroom
Performance
|
High
|
Low
|
Teachers expect students to fall within quadrants I and IV.
Gifted students usually do well on standardized tests and earn high marks in
their classes. Poor students usually do poorly in both testing and grading.
Problems arise when students fall into the other quadrants.
What to do right now: work with your department and come
up with a plan to better the performance of students in quadrants II and III.
Why does your quadrant II student refuse to work at his potential? How can I
arrange my assessments to prevent the quadrant III student from cheating or how
can I get her to take the standardized tests more seriously?
No comments:
Post a Comment