In this 2015 photo, a student works on math problems as part of a trial run of the new PARCC test at Annapolis Middle School in Annapolis, Md. (AP Photo/Patrick Semansky)

Five million students took the new Common Core exam known as PARCC last year, most of them by logging onto a computer. But about one in five took the exam with paper and pencil, and those students — who tested the old-fashioned way — tended to score higher than students who took the tests online, according to Education Week.

It’s not clear whether the score differences were due to the format of the testing, or due to differences in the backgrounds of the students who took the two different types of test, according to the Feb. 3 Education Week report. But the publication reported that in some cases the differences were substantial enough to raise concerns about whether scores on the exam — the Partnership for Assessment of Readiness for College and Careers test — are valid and reliable enough to be used for teacher evaluations or school accountability decisions.

As Ed Week reporter Benjamin Herold wrote:

In December, the Illinois state board of education found that 43 percent of students there who took the PARCC English/language arts exam on paper scored proficient or above, compared with 36 percent of students who took the exam online. The state board has not sought to determine the cause of those score differences.

Meanwhile, in Maryland’s 111,000-student Baltimore County schools, district officials found similar differences, then used statistical techniques to isolate the impact of the test format.

They found a strong “mode effect” in numerous grade-subject combinations: Baltimore County middle-grades students who took the paper-based version of the PARCC English/language arts exam, for example, scored almost 14 points higher than students who had equivalent demographic and academic backgrounds but took the computer-based test.

“The differences are significant enough that it makes it hard to make meaningful comparisons between students and [schools] at some grade levels,” said Russell Brown, the district’s chief accountability and performance-management officer. “I think it draws into question the validity of the first year’s results for PARCC.”

Jeff Nellhaus, the chief of assessment for PARCC, acknowledged the difference in test scores in an interview with The Washington Post on Wednesday, and said that PARCC had tried to suss out whether the differences were due to the format of the test itself, or to differences in the abilities of students who took the two exam types.

It’s impossible to completely disentangle the two, but it appears that ability was the more important factor, Nellhaus said. “Student familiarity with the platform” also appears to be accounting for some of the difference, he said.

“It wasn’t across the board in every test, but in some of our tests, it appears that this familiarity factor did come into play,” Nellhaus said, emphasizing that score differences varied according to grade level and subject, school, district and state. He said that questions about how to interpret the scores from the two test formats, and how to use them to make decisions about schools and teachers, are best answered by each state and district.

Nellhaus also said he expects the “familiarity factor” to disappear as students become accustomed to testing online, a format he said makes more sense as teachers increasingly deliver instruction online, and one that brings costs down and will eventually allow for quicker turnaround of student scores.

But that’s not much comfort for teachers whose evaluations depend, or will depend in the future, in part on their students’ test scores.

“Tests should be about whether students have learned and can apply subject matter that was taught. If students are not used to working with specific content on computers, no one would know whether a low score is due to lack of content knowledge, a lack of proficiency with computer technology or problems with the technology,” said Randi Weingarten, president of the American Federation of Teachers, in an e-mailed statement. “We should be certain what’s actually being measured. But in the end, parents should discuss with their children’s teacher any poor result, see what might be the problem and try to resolve it.”

It’s not entirely surprising that scores would be lower for children testing on computers, a new format for students used to filling in multiple-choice bubbles with No. 2 pencils. Many educators worried that the test would end up measuring students’ comfort with technology instead of their ability to read and do math — a particular problem in schools without enough computers to give students frequent practice.

Nina Lattimore, principal of Marley Elementary in Maryland’s Anne Arundel County, raised those concerns in 2014, when her school participated in PARCC field testing.

“You have to have the capacity to have kids constantly using this technology” to prepare them, Lattimore told The Washington Post at the time, pointing out that the paper-and-pencil strategies students used to answer questions now need to be replaced with online strategies.

“You’re asking kids to be proficient in more than one tool,” she said. “You’re not just testing reading comprehension. You’re testing whether the child can use an online tool.”