Dear friends and readers,
Why is it that these experts on collecting data seem so ignorant about the science of data collection? If we want “objective” data then the cheapest and most honest results can be gathered by sampling the population rather than gathering data on everyone. Most data in this world has been collected in that way for a very long time. So I was delighted to read in the latest Ed Week (April 2, p 30) that James Pellegrino who is an advisor to the development of the new Smarter Balanced CCSS tests and a distinguished expert in the field had this to say: “If we want to monitor the system as a whole we could use more effective strategies, like student sampling and matrix sampling of task. There are ways to do it that are more efficient to answer the kinds of questions that we want to have answers to…without requiring every kid…”
Yes, so why does this simple truth still not been tried when it comes to large-scale student testing? There must be a reason. Obviously some test makers make money on this, as well as text publishers, coaches, advisors, etc. But, like the NY State agents, who go in classrooms to monitor teachers, the main purpose may be to control teaching and schooling. I was amazed to discover in the early 1980s when I was planning a new secondary school that the NY Regent exams had a mythic importance. No college cared or even asked students for their Regent scores. A few got scholarships for their high scores. But…even teachers at the elite Stuyvesant high school based their teaching on the test. One teacher, who taught earth science, told me that alas my son was not in his class last year when he taught a better course. I was completely dub-founded. Why not teach it this year, I asked? Because, he told me, the NY State Earth Science Regents test covers different material this year and students score best by focusing on different, in his view, less important things.
p.s. Of course, test companies use sampling in developing their tests, insuring that the selection of items will fall on a normal curve or rank order consistent with past tests and predictive of future tests—where the “measurement error” is small enough to be useful. (“Measurement error” defines those that appear to be random mistakes, that do not seriously modify the expected ranking of scores.)