Even as most of the nation’s 15,000 public school districts roll out new systems to evaluate teachers, many are still struggling with a central question: What’s the best way to identify an effective educator?

After a three-year, $45 million research project, the Bill and Melinda Gates Foundation believes it has some answers.

The most reliable way to evaluate teachers is to use a three-pronged approach built on student test scores, classroom observations by multiple reviewers and teacher evaluations from students themselves, the foundation found.

“We identified groups of teachers who caused students to learn more,” said Thomas J. Kane, a professor at the Harvard Graduate School of Education and principal investigator of the Gates study, also known as the Measures of Effective Teaching project.

The findings released Tuesday involved an analysis of about 3,000 teachers and their students in Charlotte; Dallas; Denver; Memphis; New York; Pittsburgh; and Hillsborough County, Fla., which includes Tampa. Researchers were drawn from the Educational Testing Service and several universities, including Harvard, Stanford and the University of Virginia.

The large-scale study is the first to demonstrate that it is possible to identify great teaching, the foundation said.

Researchers videotaped 3,000 participating teachers and experts analyzed their classroom performance. They also ranked the teachers using a statistical model known as value-added modeling, which calculates how much an educator has helped students learn based on their academic performance over time. And finally, the researchers surveyed the students, who turned out to be reliable judges of their teacher’s abilities, Kane said.

They used all that data to identify teachers who seemed effective. And then they randomly assigned students to those teachers for an academic year.

The teachers who seemed to be effective were, in fact, able to repeat those successes with different students in different years, the researchers found. Their students not only scored well on standardized exams but also were able to handle more complicated tests of their conceptual math knowledge and reading and writing abilities.

Researchers found that multiple classroom observations of teachers by several people — a principal, a peer, an outside expert — result in the most accurate assessments. Many school districts currently rely on observations by just one person, usually a principal.

The Gates Foundation hopes that states and school districts will use the research to create evaluation systems to help teachers improve, not just in hiring and firing decisions, said Vicki Phillips, who directs its college-ready education programs in the United States.

Denver is already doing so, said Tom Boasberg, superintendent of the Denver public schools. “There’s not some clear dividing line in the middle, with some folks on one side who are clearly not effective teachers and some on the other who are clearly effective,” he said. “You have a lot of folks in the middle who want to get better. The key is to use multiple measures and feedback to help them get better in this enormously complex job.”

For decades, teacher evaluations were little more than a formality in most school systems, with most educators getting top ratings based on little more than a principal’s checklist. Tenure, rather than student achievement, largely determined whether a teacher was rehired at the end of a school year.

But reformers have been pressing for evaluations that judge teachers at least in part on how well their students perform on tests. The Obama administration has accelerated that change by requiring states to adopt such evaluation systems to compete for Race to the Top funds or to receive waivers from No Child Left Behind, the federal education law.

Critics have said that some of the new evaluation systems place an unhealthy emphasis on test scores.

Randi Weingarten, president of the American Federation of Teachers, said the findings from the Gates study “reinforce the importance of evaluating teachers based on a balance of multiple measures of teaching effectiveness, in contrast to the limitations of focusing on student test scores, value-added scores or any other single measure.”