Testing thought experiment

Posted on March 23, 2015 by debmeier

Dear Readers,

Let’s do a “what if” experiment. Supposing that all the poor and Black and Hispanic children surprised us all and got scores more or less equal to (or even better!) than their richer and whiter peers on the spring tests.

If you imagine there would be celebrations galore, think again about why this could not happen. Not just why poverty is a handicap, but why no test could ever prove it is not.

Because every test-maker in the world would know there was something wrong with that test’s pool of items long before scores were reported—during its field testing period—and do whatever’s necessary to make the test “harder”—or, more “accurate.” It doesn’t require even changing the items, but just a few tweaks in the choices of answers will usually do. This is not a guess on my part, it is what some folks who’ve explored the ETS pool of SAT questions have long ago discovered. If an item is “favored” by Black students (or other group that does not normally do well on the test) it is removed as an unreliable.

We are simply more sophisticated at doing what the original IQ designers did a century ago when they tested how “rigorous” an item was by seeing who got it right and wrong based on their occupational status.

I hate to tell you—but us Jews didn’t do too well at first. We weren’t doctors, lawyers and business makers in the early 1900s. And, I suspect, they may never have later selected items on the basis of whether or not the testee was Jewish (as Jewish was probably not one of the boxes to check)—or we would still be scoring in the bottom half.

We are getting crasser at this—with less cover-up. I note that the latest improved model does not promise a normal curve or any particular pre-designed percentiles. It just waits until the results are in and then figures out how to score it so that it sends the right message. Literally.

We even did this with the National Board’s professional teaching test. It seemed appropriate. We decided ahead of time that it had to pass enough people to not seem impossible and yet not so many that it could be accused of being too easy. And it ought to correspond—more or less—with what those who knew them would say if asked. Sampling did the trick and the test did what it wanted to do, although it needed some revising because it passed too few teachers of color, just as the SAT had done years earlier to see that females were getting scores comparable to males—at least on Language Arts tests. Problem identified. Problem fixed.

Hmmm. So, imagine the scenario I started with. Why not? Imagine how it would mess up real estate ads that like to tell prospective buyers what the average SAT or regent’s scores are for the school in their zone. They know—because it matters to them—what they are really measuring: social and economic status, which includes race.

No matter how fast the kids line-up after recess exactly the same number will be first, second, third… and last. And most of us who’ve watched these kids at recess a lot soon know who will be where in the line. It doesn’t usually correlate with SATs however. And, if it did, would we focus on giving running lessons to the slow pokes?

Filed under: 2015 Posts |

« Learning to Read Ivy League »

Links to other Deborah Meier related sites (for links to other important educational sites, see my "LINKS" page):
The Forum for Education and Democracy
Deborah Meier/Michael J. Petrilli<a	FairTest
Mission Hill School
North Dakota Study Group
Deborah Meier Archives at Indiana State University
Teacher Education Resources and Commentary

Deborah Meier on Education

Bridging Differences

Where I’ll Be

Network for Public Education

Good Morning Mission Hill

Central Park East Elementary School

Twitter Updates