First it happened in New York, then Illinois and now Georgia.

Professors from colleges and universities in Georgia have written an open letter to the governor, Nathan Deal, and other officials involved with education policy to express opposition to a new teacher evaluation system that depends largely on student standardized test scores.

“Value-added” assessment has become all the rage in teacher and principal evaluation across the country, even though assessment experts have repeatedly warned against using test scores for this purpose.

Earlier this year researchers and professors in New York and Illinois sent their own letters to officials in their states protesting value-added assessment systems in those states. The protests are part of a growing backlash against the high-stakes use of standardized tests, including using the results of students to evaluate the effectiveness of teachers and administrators.

The following letter sent in Georgia clearly explains the problems with “value added:”

An Open Letter of Concern Regarding Georgia’s Implementation of its New Teacher/Leader Evaluation System to:

Governor Nathan Deal

Dr. John D. Barge, Georgia State School Superintendent,

Brooks Coleman, Chair, GA House Education Committee

Fran Millar, Chair, Senate Education and Youth Committee

Jill C. Fike, Director, GA State Senate Research Staff

and Superintendents of the following Georgia school systems:

Atlanta, Ben Hill, Bibb, Burke, Carrollton, Chatham, Cherokee, Clayton, Dade, DeKalb, Dougherty, Gainesville, Gwinnett, Hall, Henry, Meriwether, Muscogee, Peach, Pulaski, Rabun, Richmond, Rockdale, Spalding, Treutlen, Valdosta and White

The state of Georgia plans to implement significant systemic changes in teacher evaluation in the 2012-2013 school year. GREATER is a consortium of Georgia university professors, researchers, and educational advocates. We study many disciplines directly connected to education such as education policy, measurement, ethics, multiculturalism, curriculum, and evaluation and we wish to express our deep concern about the initiation of this new evaluation system. GREATER joins with our colleagues in Chicago, IL and New York State, where similar evaluation methods are being implemented [1] and, as professionals, we are reaching out to policymakers and legislators to caution against educational policy change in Georgia without high-quality evidentiary research and support.

As university scholars and professors who specialize in educational research, we recognize that change is an essential component of school improvement.

We support accountability and high standards and we want what is best for all students in Georgia. We firmly believe, however, that Teacher Keys and Leader Keys are unproven evaluation systems that carry foreseen consequences and they are not the path to lasting school improvement.

The state’s new evaluation system, Teacher Keys and Leader Keys, centers on “value-added” measures of student growth. We believe the use of value-added measures in teacher and leader evaluation will likely lead to negative educational, social, and emotional outcomes for Georgia’s children. We believe it is our ethical, moral, and professional obligation to raise awareness about how the proposed evaluation changes not only lack a sound research basis but also, in some instances, have already proven to be detrimental.

Knowing that many in the legislature would benefit from the input of those who work full time in the field before making a final decision, we offer four concerns and two recommendations, substantiated by rigorous and relevant educational research. Research supports our primary recommendation that the state returns the federal monies related to this project and chooses to “opt out” as Idaho, Indiana, Kansas, Minnesota, Oregon, South Dakota, Virginia, West Virginia and Wyoming have.

However, at a minimum, we encourage (1) further pilot and evaluate the system before implementing it on a large scale and (2) drastically reduce or eliminate the percentage of student growth as a measure of teacher or leader effectiveness. The Value-added Model is predicated on the belief that tools and processes can be used to accurately determine an educator’s impact on student learning. However, Value-added Models do not address the fact that even in the best of circumstances a teacher’s efforts are one of many indistinguishable conditions for student success.

GREATER does not propose to negate an educator’s responsibility to provide a high quality learning experience. However, it is clear that the tools used to measure educator value (Keys) and the tools used to measure student learning provide an incomplete picture. To base an educator’s evaluation and eventual livelihood on incomplete data does not advance the goals that we all have, which include increasing student academic knowledge, skills and habits and developing productive, informed citizens who are inquiring adults and lifelong learners.

Further, as the majority of decision makers and consultants in this process do not have a concentrated background in the field of educational research or, in some cases, are not located in the state of Georgia, we urge the State Board of Education and the Georgia Legislature to consult on the recommendations proposed in this letter, as well as other proposed educational reforms, with the professors and researchers among us who are local and accessible. We bring both scholarly and practical expertise of national renown to these issues.

Our forthcoming recommendations are based on the following concerns:

(1) Validity—Value-added Models are not proven;

(2) Feasibility—This model is not the most useful way to spend education funds;

(3) Unintended consequences—Students and teachers will be adversely affected; and

(4) Timing—Georgia is not prepared to implement this evaluation model.

Concern #1: Validity – Educational researchers strongly caution against teacher evaluation approaches that use Value-added Models (VAMs).

Georgia has already used a value-added statistical model to determine which schools were to be put on probation, closed, or turned around under No Child Left Behind (NCLB)and found this model wanting. For the new teacher evaluation system, “student academic growth” will be measured with VAMs or similar models. Myriad researchers have found that value-added models (VAMs) of teacher effectiveness do not produce stable ratings of teachers. For example, different statistical models (all based on reasonable assumptions) can yield different effectiveness scores [2]. Even when models try to control for prior achievement and student demographic variables, teachers are unduly advantaged or disadvantaged based on the students they teach. Researchers have found that teacher evaluation scores can fluctuate from class to class, from year-to-year, and from test-to-test [3]. In making the decision to use VAM’s, we encourage the state to consider that ten prominent researchers of assessment, teaching, and learning recently wrote an open letter that included some of the following concerns about using student test scores to evaluate educators [4]:

a. No evidence exists that evaluation systems that incorporate student test scores produce gains in student achievement. In order to determine if there is such a relationship, researchers recommend long-term, small-scale pilot testing of such systems. Furthermore, student test scores have not been found to be a strong predictor of the quality of teaching as measured by other instruments or approaches[5].

b. Testing companies themselves advise against the use of their instruments to evaluate educators or provide supporting evidence linking test scores to any type of teacher pay for performance model [6].

c. Validity of the testing instruments used to evaluate the students’ value-added scores is a large concern in this case. Validity refers to the degree in which an interpretation of a test score is supported by evidence. For a measure of teacher effectiveness to be valid, evidence must support the argument that the measure can actually determine the teacher effectiveness it claims to measure. This is essential. An assessment instrument must be validated before it can be used for particular purposes [7]. Assessments designed to evaluate student learning are not necessarily valid for measuring teacher effectiveness or student learning growth [8]. Using them to measure the latter is akin to using a meter stick to weigh a person: you might be able to develop a formula that links height and weight, but there will be plenty of error in your calculations.

Concern #2: Feasibility – This evaluation model is not the most responsible, realistic use of state funds and human resources.

At a time when class sizes in Georgia are being increased, teachers furloughed, staff cut, enrichment activities decreased, and school years shortened due to lack of school funding, spending so much taxpayer money on an untested, un-validated instrument is fiscally irresponsible. Furthermore, this assessment model places a heavy unsupported burden on local school leaders, teachers, and colleges of education.

a. This assessment model places a HUGE burden on school leaders. The induction phase of this system is a continuous (up to two-year) cycle in which evaluators will often be asked to evaluate content delivery and successful implementation of content-based research. Will evaluators, who are often principals or lead teachers, be trained to recognize effectiveness in subject matter not their own? When will this “training” actually take place and who will conduct it? Furthermore, in an effort to provide flexibility for the 26 partner school districts, many of the provisions require the districts to develop their own evaluations of teacher induction systems. In an already understaffed and overworked school system, who will design and implement these instruments and who will pay the salaries of those hired to do so?

b. This assessment model places a HUGE burden on teachers.

A primary (and problematic) presumption of a value-added model is that a teacher’s effectiveness can be identified independently through students’ standardized test scores. This evaluation system makes teachers solely responsible for student success when, in reality, quite the opposite is true.

Teachers do not work in isolation because schools are learning communities where all parts contribute to student development. An evaluation system that even partially bases an individual teacher’s evaluation on his or her students’ scores ignores the reality that student success is often predicated on the work of many in a school, including reading teachers, resource teachers, reading and English Language Learner specialists, guidance counselors, social workers, psychologists, and other personnel. Most importantly, out-of-school factors are actually more responsible for student success [9]. Non-classroom-teacher factors, including parents’ income level and level of education, account for roughly 85-90% of the statistical variation in students’ test scores [10]. How could we possibly begin to disaggregate each individual’s effect? And why would we want to? Schools operate best when there is cooperation among all caretakers, faculty, and staff members [11] and when all are accountable for each student’s learning [12]. Furthermore, teachers will be implementing the new Partnership for Assessment of Readiness for College and Careers (PARCC) ( assessments just as this evaluation system is introduced. Most teachers are unfamiliar with the PAARC format, items, and system, adding to an already stressful testing situation.

c. This assessment model places a HUGE burden on colleges of education.

The effective preparation of teachers requires practice time spent with mentor teachers in actual classrooms. The model of preparing new teachers, called clinical practice or student teaching, is similar to the one used to prepare medical professionals. The student-teaching portion of an educational program has been determined by some research, such as that done by the Blue Ribbon Commission of NCATE, to be one of the most important and influential parts of a teacher education program [13]. This requires districts, schools, and mentor teachers to willingly allow colleges of education to place student teachers in their classrooms. Yet, even now, many of us who spend time in schools working with students and their mentor teachers have found it increasingly difficult to find placements because of teacher and leader concerns over student test scores. Given that the new teacher effectiveness measure places so much emphasis on test scores, will student teachers still be welcomed? We fear the answer will be a resounding no.

Concern #3: Unintended consequences – Students will be adversely affected by the implementation of this new teacher evaluation system.

Our undue focus on testing affects teacher-student relationships and makes it more difficult to establish a classroom community of academically successful learners and critical thinkers.

a. Since the initiation of NCLB, a focus on test preparation to the exclusion of other content has come to be known as “narrowing the curriculum”[14]. Enrichment activities in the arts, music, civics, and other non-tested areas have diminished. Using student test scores as a measure of teacher “value” will further restrict what is taught. Educators have spoken of their lived experience with unintended consequences of the narrow curriculum from NCLB over the past decade. Children arrive in middle school without fundamental abilities in non-tested areas. Children are not taught that ideas and issues are multi-faceted but are in some way, artificially constrained to language arts, math, and science.

b. There has been and, with the implementation of these evaluations, inevitably will be a further narrowing of the curriculum as teachers focus more on test preparation and skill-and-drill teaching – particularly in low-scoring schools which are largely attended by low Socio-Economic Status students and students of color [15]. By focusing on testing to the exclusion of true teaching, we further catalyze a pending civil rights battle for equal educational opportunities.

c. Teachers will subtly but surely be incentivized to avoid students with health issues, students with disabilities, students who are English Language Learners, or students suffering from emotional issues. Research has shown that no evaluation model yet developed can adequately account for all of these ongoing factors [16].

d. When student test scores take a front and center position and the livelihoods of teachers, leaders, and schools are dependent on these scores, we must stop acting surprised that cheating scandals emerge. Georgia is one of many states already marred by allegations of cheating on standardized tests. Are we assuming that linking standardized test scores to teacher and leader evaluations will make things better? If so, how?

Concern #4: Timing – Georgia is not ready to implement a teacher evaluation model that is based on the use of “student growth” as a significant determinant of teacher effectiveness.

A pilot of Georgia’s proposed evaluation system began in January 2012 and ended May 2012. The state has acknowledged that they intend to adjust the model based on the findings of the pilot and implement a finalized iteration in Fall 2012. Unfortunately, the state has not allowed itself enough time to analyze data and evaluate the outcomes of the semester long pilot with validity and reliability. The current plan leaves only two months to analyze data from the pilot and make appropriate adjustments and assumes that the outcomes of the pilot will be valid, reliable, or even desirable. These are serious assumption to make about an instrument that will have such a powerful effect on the lives of teachers, principals, students, and families.

For the student growth and academic achievement portion of the Teacher/Leader Keys evaluations, the state and local schools systems must take into consideration:

a. The influence of certain student characteristics such as placement in special education, limited English-language proficiency, and residence in low-income households.

b. How they will accurately match teachers to their actual students (e.g. who gets the “credit” for student outcomes in a co-taught class, a class with a paraprofessional, a class with a change of teacher, a new student mid-year arrival, or a class with a support teacher).

c. That teachers, principals, “coaches” and other school leaders have to be trained in the use of student assessments for teacher evaluation and this training will take place in addition to training already planned for the new Common Core State Standards (CCSS); thus, becoming a burden to already overworked individuals.

d. That there is little point in providing value-added teacher evaluations unless they will trigger continuous goal-setting for areas teachers want to work on and provide coaching, remediation, and support through high-quality professional development. At a time when school systems are furloughing teachers and cutting school days, the possibility of any system having appropriate and sufficient funding to support high-quality professional development is seriously in doubt.

Further, for teachers of “non-tested subjects” (e.g. art or music), a standardized value-added student assessment does not exist. While some work is being done nationally to develop assessments for “non-tested” subjects, this work is in its infancy. Despite this fact, participating Georgia school districts must identify at least one “non-tested” assessment for every grade and every subject; determine how student growth will be measured on these assessments; and translate the student growth from these different assessments into teacher evaluation ratings in a fair manner [17].

Our Recommendations

1. Further pilot and adjust the evaluation system before large-scale

implementation. Any annual evaluation system should be piloted and adjusted on a small scale for a length of time that provides sufficient feedback before being implemented statewide. Delaware spent years piloting and fine-tuning their system before formally putting it in place statewide. Conversely, Florida’s and New York City’s teacher evaluation systems made headlines when their rapid implementation led to unintended negative consequences [18].

2. Reduce or eliminate the percentage that “student growth and academic achievement” counts toward teacher or leader effectiveness and look for more valid and reliable ways to measure effectiveness.

We are aware of the complexity of developing teacher evaluation models. However, until standardized student-growth measures are found to be valid and reliable sources of information on teacher or principal performance, they should in no way play a role, or should play a very limited role, in summative ratings. There are other types of instruments and evaluative programs that provide a better, more precise picture of teacher effectiveness. These measures focus on what a teacher does and how practice can be strengthened through non-value-added measures, such as paid peer mentoring of first and second-year teachers, seminars, personal portfolios and reflections, and an ongoing analysis of teacher holistic performance. One clear example of these methods may be found in the TEAM (teacher education and mentoring) program currently implemented in Connecticut, which provides provide differentiated professional learning for beginning teachers as they reflect on instructional strategies and analyze student data and outcomes [19].

Students benefit when objective feedback is part of their teachers’ experience [20]. Similar frameworks for principals can serve the same purpose.

The GREATER consortium concludes that hurried implementation of teacher evaluation using student value-added growth models will result in inaccurate assessments of our teachers, a demoralized profession, decreased learning, and harm to the children in our care. Further, it is a waste of our state’s increasingly limited resources to widely implement a program that has not yet been thoroughly field-tested or fully strategized. Our students are more than the sum of their test scores, and research clearly shows that an overemphasis on test scores will not result in increased learning, increased well-being, or greater success.

According to a nine-year study by the National Research Council [21], the past decade’s emphasis on high-stakes standardized testing has yielded little learning progress. This is particularly troubling when we consider the cost of this emphasis to taxpayers.

We all cannot afford to lose sight of what matters the most—the academic, social, and emotional growth and well-being of Georgia’s children. Our students, teachers, and communities deserve better. They deserve thoughtful, reliable, valid reforms that will improve teaching and learning for all students. It is in this spirit that we write this letter.

* *

Signed [originally] by 38 educational researchers across the State of Georgia, as of June 25, 2012. University affiliations are listed for identification purposes only and do not imply affiliated consent.

1.(Primary Contact) Mari Ann Roberts, Ph.D., Clayton State University, 404-374-9154

2. JoBeth Allen, Ph.D., University of Georgia

3. Sohyun An, Ph.D., Kennesaw State University

4. Dennis Attick, Ph.D., Clayton State University

5. Nadia Behizadeh, Ph.D., Emory University

6. Eric Bridges, Ph.D., Clayton State University

7. Joseph Cadray, Ph.D., Emory University

8. Leslie Coia, Ph.D., Agnes Scott College

9. Caitlin Dooley, Ph.D., Georgia State University

10. Erica Dotson, Ph.D., Clayton State University

11. Aiden Downey, Ph.D., Emory University

12. Samantha R. Fowler, Ph.D., Clayton State University

13. Mary Hollowell, Ph.D., Clayton State University

14. Karen Falkenberg, Ph.D., Emory University

15. Bob Fecho, Ph.D., University of Georgia

16. Jillian Carter Ford, Ph.D., Kennesaw State University

17. Jack Hassard, Ph.D., Georgia State University

18. Rebecca Hill, Ph.D., Kennesaw State University

19. Marquita Jackson-Minot, Ph.D., Georgia Gwinnett College

20. Tracey Laird, Ph.D., Agnes Scott College

21. Amy Lovell, Agnes Scott College

22. Cindy Lutenbacher, Ph.D., Morehouse College

23. Samuel Maddox, Ph.D., Clayton State University

24. Regina Meeler, Ph.D., Gainesville State College

25. David Messer, Ph.D., Clayton State University

26. Dashaunda Patterson, Ph.D., Georgia State University

27. Jenny Penney Oliver, Ph.D., University of Georgia

28. Tina Pippin, Ph.D., Agnes Scott College

29. Amanda Richey, Ph.D., Kennesaw State University

30. Scott Ritchie, Ph.D., Kennesaw State University

31. Peter Smagorinsky, Ph.D., University of Georgia

32. Patricia Smith, Ed.D., Clayton State University

33. Vera Stenhouse, Ph.D., President, GA NAME

34. Anthony B. Stinson, Ph.D., Clayton State University

35. Peggy Thompson, Ph.D., Agnes Scott College

36. Rachel Trousdale, Ph.D., Agnes Scott College

37. Clemmie Whatley, Ph.D., eddynamix

38. Brian A. Williams, Ph.D., Georgia State University

1. Note: This letter was adapted from the letter written by Dr. Kevin Kumashiro, University of Illinois at Chicago, which was signed by more than 80 university professors and researchers in the Chicago area.

It was also inspired by the letter written by Sean C. Feeney, Ph.D. and Carol C. Burris, Ed.D., which was signed by more than 1400 New York State principals in opposition to New York’s evaluation plan.

2. Papay, J. (2011). Different tests, different answers: The stability of teacher value-added estimates across outcome measures. American Educational Research Journal, 48(1), 163-193.

3. Darling-Hammond, L. (2012). Creating a Comprehensive System for Evaluating and Supporting Effective Teaching. Stanford, CA. Stanford Center for Opportunity Policy in Education.

4. Baker, E., et al. (2011). Correspondence to the New York State Board of Regents. Retrieved October 16, 2011 from

5. See Burris, C., & Welner, K. (2011). Conversations with Arne Duncan: Offering advice on educator evaluations. Phi Delta Kappan, 93(2), 38-41.

6. FairTest (2007). Organizations and Experts Opposed to

High-­Stakes Testing.

7. Goe, L., Bell, C. & Little, O. (2008). Approaches to Evaluating Teacher Effectiveness: A Research Synthesis. National Comprehensive Center for Teacher Quality.

8. Goe, L., & Holdheide, L. (2011). Measuring teachers’ contributions to student learning growth for nontested grades and subjects. Retrieved February 2, 2012 from

9. Goldhaber, D., Brewer, D. & Anderson, D. (1999). A three-way error components analysis of educational productivity. Education Economics 7(3), 199-208.

10. See Hanusheck, E., Kain, J., & Rivkin, S. (1998). Teachers, schools and academic achievement. Retrieved October 16, 2011 from

11. DuFour, R., & Eaker, R. (1998). Professional learning communities at work. Best Practices for enhancing student achievement. Bloomington, IN: National Educational Service.

12. See

13. NCATE (2010). Transforming Teacher Education Through Clinical Practice: A National Strategy to Prepare Effective Teachers. Retrieved June 1, 2012 from

14. Crocco, M.S., Costigan, A.T. (2007). The Narrowing of Curriculum and Pedagogy in the Age of Accountability Urban Educators Speak Out. Urban Education, 42(6), 512-535.

15. Committee on Incentives and Test-Based Accountability in Education of the National Research Council. (2011). Incentives and Test-Based Accountability in Education. Washington, DC: National Academies Press.

16. Baker, E., et al (2010). Problems with the use of test scores to evaluate teachers. Washington, DC: Economic Policy Institute. Retrieved October 16, 2011 from

17. Note: RT3 information procured from Strickland Design (2012), RT3 Newsletter Archives. Retrieved April 7, 2012 from and the GA Department of Education

18. See

19. TEAM Retrieved from and

20. National Board for Professional Teaching Standards (2012). Impact of National Board Certification on Teacher Practice & Schools. Retrieved April, 7, 2012 from

21. Committee on Incentives and Test-Based Accountability in Education of the National Research Council. (2011). Incentives and Test-Based Accountability in Education. Washington, DC: NationalAcademies Press.