Trail of Clues Preceded Regents Fiasco
New York Times
October 15, 2003
By MICHAEL WINERIP
In recent years, as New York introduced its new exams required for high school
graduation, there were many signs of trouble in the state testing program.
In 2001, teachers complained that the scoring on the new Regents biology and
earth science tests was too easy. In 2002, they complained that the scoring on
the new physics test was too hard. Physics students tend to be the brightest in
the state, and typically 15 percent had failed the old state test; 40 percent
failed the new physics test.
Superintendents complained that there was no consistency. In Rye, a wealthy
suburb, 79 percent of students scored at mastery level (85 percent or higher) on
the state English test, 75 percent on the state history test, and 3 percent on
physics. Even when many districts refused to give the new physics test and the
state superintendents' association wrote colleges, recommending that they ignore
the physics results, the state education commissioner, Dr. Richard P. Mills,
would not budge.
Dr. Mills, one of the leading advocates of such testing in the nation, kept
issuing upbeat news releases, saying his testing program was "statistically
sound" and "in accordance with nationally accepted standards."
People who should have known better were fooled. New York's testing program was
one of the first approved by the federal government under No Child Left Behind.
In the spring, a survey of testing programs by Princeton Review ranked New York
first of the 50 states. "Mills was hard core" about testing, said Steven Hodas
of Princeton Review. "He had a take-no-prisoners attitude," he said.
No more. In June, it all came crashing down when the scores on the state Math A
exam required for graduation were released. Two-thirds of students had failed
and the outcry was so great that Dr. Mills had to dump the results and rescale
the test. Then came news that a record 47 percent had failed the June 2003
physics test and that test too has to be rescaled.
Now the experts are looking at New York differently. "I'm stunned," said Mr.
Hodas. "Frankly, I thought they were professionals." He said Princeton Review
was doing a major overhaul of its rating standards. "We're going to have to come
up with a fiasco index for a state like New York that messes up a lot of
people's lives for no reason," he said.
What went wrong? The answer came last week from a special panel of math and
testing experts appointed by Dr. Mills to investigate. The panel's short answer
is that New York's testing program did not meet national standards and was not
To develop a test with a dependable pass/fail ratio first requires extensive
field testing of sample exams to gauge the difficulty of each question, plus the
exam as a whole. And yet, the panel found, the state ignored the most basic
testing guidelines. To prepare the math exam properly, 1,500 students should
have been sampled in field tests, the panel said; only 250 were.
For field testing to be reliable, the panel noted, it must be conducted under
conditions that simulate the stress of actual testing; the state field testing
was done with students who knew it did not count and often did not bother to
answer hard questions. The result? "New York State can't accurately predict
performance on Math A," the report said.
The panel found the test was developed "on the cheap," with inadequate staff.
The state has one testing expert with a 30-member staff to oversee 70 tests,
said Dr. William J. Brosnan, the panel chairman and Northport schools
superintendent. "The staff had to generate one document per person per day," Dr.
Brosnan said. "You can't expect people to perform under those conditions."
The panel found so much staff turnover that no one remained from 1998, when the
Math A test was created. "No one knew the most basic information about the
development of the test," Dr. Brosnan said.
To compensate, the state hired consultants, but the panel found their work
questionable. The consultants' report on the field tests should have been
submitted three months before the exam to allow for adjustments, Dr. Brosnan
said. "To this day we haven't seen the consultants' report," he said. "When we
called to ask, we were told by the consultant that the individuals no longer
worked there. It raised confidence issues."
Before items were included on the test, they were supposed to be approved by a
state panel of teachers, but four items in the June exam were added at the end,
without consulting those teachers. "Several of those items were poorly worded,
confusing and answered incorrectly by large numbers of students," Dr.
When the high failure rate became public, state officials suggested that it was
because many weaker students who had previously failed the math test had taken
it again in June 2003. But when the panel compared results from the strongest
students taking the test, ninth graders, they found that this group scored 18
points lower than the ninth graders who took the June 2002 Math A test. "We were
surprised the fluctuation was that bad," Dr. Brosnan said.
The state also did a poor job informing teachers how to prepare for the test.
The panel found that state standards were too vague to be useful. With 103
subjects to be mastered for a 35-item test, Dr. Brosnan said, "teachers felt
pressed to go a mile wide and an inch deep." Seeing a trigonometry standard,
teachers spent weeks on the subject, Dr. Brosnan said. "And then there was not a
single trig question. Teachers felt betrayed and kids were let down." The panel
recommended completely rewriting the math standards.
At the Regents board meeting last week, Dr. Mills said most recommendations
would be quickly carried out. For the first time, he acknowledged that the
physics test needed to be rescaled, but wanted to limit it to future tests. The
board disagreed, voting to make the rescaling retroactive to the June 2002 test.
"If there's an error," said Lorraine Cortes-Vazquez, a regent, "it's important
for children to know, that we, as adults, will admit it and correct it."