Rewards and punishments embedded in the District’s controversial teacher evaluation program have shaped the school system’s workforce, affecting both retention and performance, according to a study scheduled for release Thursday.

Hundreds of teachers have been fired for poor performance since the evaluations were implemented four years ago. But low-scoring teachers who could have kept their jobs also have been more likely to leave than teachers who scored higher, according to the study, published as a working paper of the National Bureau of Economic Research.

The study found that imminent consequences inspired two groups of teachers to improve significantly more than others: low-scoring teachers who faced the prospect of being fired and high-scoring teachers within striking distance of a substantial merit raise.

Written by James Wyckoff of the University of Virginia and Thomas Dee of Stanford University, the study suggests that incentives such as those pioneered in the District “can substantially improve the measured performance of the teaching workforce.”

The study is among the first attempts to understand the effects of the District’s teacher evaluation system, known as ­IMPACT. Then-Chancellor Michelle Rhee introduced the system in 2009, and it was among the first in the nation to link teachers’ job security and compensation to student test scores.

Chancellor Kaya Henderson hailed the new research as evidence that IMPACT — which has stirred criticism and spurred similar initiatives in other jurisdictions — is having its intended effect. “We’re actually radically improving the caliber of our teaching force,” Henderson said.

But while average teacher evaluation scores rose during the first three years of IMPACT, the study is silent about whether the incentives have translated into improved student achievement.

“This is a very important first step in looking at how teacher evaluation programs rolling out all across the country are going to impact teachers and students on the ground,” said Jonah Rockoff, an economist at Columbia University who was not involved in the study. “But a lot more remains to be done.”

IMPACT combines observations of teachers in the classroom with their students’ test results. It scores teachers on a scale of 100 to 400, and until 2012, it sorted them into four categories: ineffective, minimally effective, effective and highly effective. (In 2012, the school system added a fifth category, developing.)

“Ineffective” teachers are immediately fired, as are teachers rated “minimally effective” twice in a row. Teachers rated “highly effective” get a bonus; the second time they earn that rating, they get a base-salary increase worth up to $27,000 per year.

Wyckoff and Dee tried to understand the effect of those incentives by examining the relationship between teachers’ ­IMPACT scores at the end of one year and their retention and performance the following year.

Effects were minimal after the first year of IMPACT, but they were statistically significant after the second year, perhaps, the authors reasoned, because teachers did not immediately believe that the incentives were real and permanent.

Teachers rated “minimally effective” for the first time, the researchers found, were more than twice as likely to leave their jobs voluntarily than teachers with higher ratings.

The researchers zeroed in on teachers who scored near 250, the threshold separating “minimally effective” and “effective.”

Teachers who scored just beneath that threshold — and faced a pay freeze and the threat of dismissal — made larger average gains the next year than teachers who scored just above that mark.

That suggests that the prospect of losing their jobs was a key factor that encouraged those teachers to improve, Wyckoff said. Teachers showed similar outsized gains when they scored on the threshold between effective and highly effective — presumably because they faced the potential of a substantial raise.

Wyckoff cautioned that the results do not speak to the effect of IMPACT on the performance of teachers who do not score near the thresholds.

Stanford Professor Linda Darling-Hammond cautioned against assuming that IMPACT scores accurately reflect a teacher’s effectiveness, pointing to studies that have shown that test-score growth can be an unreliable measure, especially when a teacher has a lot of students who are working far below or above grade level.

Middle school teacher Angel Cintron, who was rated “highly effective” last school year, said IMPACT can motivate teachers to get better. But the emphasis on test scores can be demoralizing in a high-poverty school where many students are far behind, he said, adding that he has worked with excellent teachers who quit after low test scores pushed their rating down to “minimally effective.”

“The teachers I’ve seen leave, I know they’re high quality,” Cintron said.