A Critical Response to the East Baton Rouge Parish
"Committee for Excellence in Education Report"
Once we acquired a copy of the this report we were disappointed to find that the work of the committees does not appear to be informed by research, but instead appears to have been driven, in large part, by ideology. This lack of research to support the recommendations of the committee was exhibited starkly in numerous areas of the report. We have taken two examples from the document to exemplify these limitations.
In the first example, we point out an obvious disconnect between reality and the recommendation of the committee. In the second example, there is no evidence that any of the research on the topic was examined at all, and certainly the advice of the leading groups of experts in this country was totally ignored, if the committee members were even aware of it. The depth of these obvious errors certainly calls into question the process itself, and invites a very critical response from the community when the limitations of this report are brought into the limelight of public discourse.
In the Section of the document titled "Culture and Safety/School Climate and Human Capital we find the two examples we use to illustrate the problems we found in the report. Strategy No. 5, Tactic No. 1 states:
Use the Instructional Culture Index (ICI) to measure school climate and culture and drive school-based decisions on culture. Create and publish a semi-annual school climate score card (sic) and prioritize physical plant improvements based on score card results.
The problems with this tactic become clear with just a bit of research. First, the Instructional Culture Index is not something that many in the education community would recognize, for a very good reason. The ICI was "developed" (and we use this term loosely) by The New Teacher Project (TNTP) between June and November of 2010. According to a presentation by TNTP the "study" consisted of partnering with 37 charter school campuses in Washington, D. C. with a goal of creating "an index of Instructional Culture that would help us uncover what sets top performing charter schools apart."
In a number of other areas of the presentation the full title of this index is given as the "Charter School Instructional Culture Index" (CSICI). TNTP asked teachers and administrators to fill out forms concerning their perceptions and attitudes about instruction and expectations concerning instruction at their respective schools. Some were also interviewed for the study, and "performance data" was collected on each teacher through "individual interviews with school leaders." It is not clear what this latter "data" consisted of.
TNTP, in a period of less than 5 months, studied, interviewed, analyzed and "created" this Instructional Culture Index. This is not the protocol that is normally followed when formulating the basis for a pilot study of any new proposed tool. What we see here is an instance where the exploratory work necessary to devise a tool has been misused as the initial pilot. This is usually considered bad practice simply because it encourages the researchers to build a tool that "explains" the data that they already possess rather than devising a more rigorous tool that would have explanatory power in situations where the outcome is not yet known.. According to TNTP the ICI is "a statistical composite of teacher agreement with three leading indicators of strong talent management." The three indicators are 1) Teachers at my school share a common vision of what effective teaching looks like; 2) At my school, the expectations for effective teaching are clearly defined; and 3) My school is committed to improving my instructional practice.
The ICI is a measure of the strength of these three indicators, and it was derived from a "study" taking less than 5 months, and involving 37 schools, but only 26 schools had data from the D.C. Comprehensive Accountability System (CAS). In a chart showing the correlation between the ICI and student achievement as measured using the CAS there are two years of data reported, yet obviously there was no ICI computed for the 2009 data set, for it was not yet in existence that year. In 2010 there were only 23 schools shown on the chart. It was not at all clear why you would show correlations between achievement and ICI for a year in which no ICI was available. It was also not clear why three of the 26 schools with data actually did not appear on the chart in 2010.
The methodological problems with this "ICI" are many, though some might have been cleared up if this "index" was properly piloted, peer-reviewed and replicated in a broader setting. Alas, we have seen no detailed data to suggest that it is valid at all. We also have no data to suggest that the schools used in the study were comparable, or whether the student achievement in any schools was in any way outside of norms for their respective student bodies. Any attempt to claim causality would likely have to include more schools, more time, and significantly more data and data analysis. And there is, in the literature of TNTP a clear indication that they believe that it is because schools have higher scores on the index that they are indeed scoring higher. Unfortunately this is the classic case where the limited methodology used to generate the findings invalidated such a conclusion. It is almost always possible to construct a tool that finds a correspondence between a policy initiative and data that has already been gathered. This is why the type of research associated with this study is best understood as exploratory and is best used to construct a tool whose value is demonstrated in a different context against data that played no part in the devising of the tool.
For the sake of our response to the entry of the ICI into the report we are currently examining, however, we can ignore all of these methodological and theoretical challenges to the ICI. Instead, we can simply ask one question. Is anyone looking at the language of this tactic not now aware that the authors clearly have no idea what the ICI is? How in the world could the ICI, even if it were valid, tested, and reliable, be used to "prioritize physical plant improvements" in our school system? The ICI has nothing at all to do with the physical plant of a school. At least one researcher has suggested the misapplication of this "index" is due to a reference within the presentation that included the ICI to "culture and climate." Perhaps one or more members of the committee though this meant the physical "climate" of the classroom. We also note that this is not the only time acronyms and terminology from The New Teacher Project are highlighted in the report.
The second of our examples suggesting a total lack of research orientation from those writing and developing the recommendations in this report come from the same section of the report. Strategy No. 4 states:
Reward teachers who rank in the top 25% statewide of performance in terms of improving student achievement (i.e., using the statewide value-added assessment model) and remove those teachers who rank in the bottom 25% statewide of performance.
Putting aside the tortured language of this "tactic," we examined the research on the validity of using current "value-added" assessment of student achievement to determine teacher effectiveness. Value-added assessment are used because researchers have long ago determined that measuring the actual scores of students on a test give no reliable data about the effectiveness of the teacher. This is due to a number of limitations. For example, long-accepted and validated research has clearly pointed out that teachers are responsible for only a small portion of student achievement. For example, Charles Lussier recently highlighted research published in 1998 by "economists Eric Hanushek, John Kain and Steven Rivkinin " who "estimated that at least 7.5 percent of the variation in student achievement resulted directly from teacher quality and added that the actual number could be as high as 20 percent. Out-of-school factors such as poverty, however, remain much better predictors of student achievement." For this reason, the scores of students in a particular teacher's classroom would be more affected by factors outside of school than by the teacher's instruction.
Therefore, researchers and "reformers" who wish to access the quality of teachers have resorted to "value-added" measures. In its simplest forms "value-added" is the process of statistically predicting where a student should be achieving academically at a certain point in time based on a variety of factors, including the student's prior scores and the average growth of similar students, and then comparing the actual score with the predicted score. If the student achieves a higher than predicted score, that is, in the scenario, extra value "added" by the teacher. In practice, the gains of a class of students are compared to the gains predicted and achieved by the "average" teacher for those students. It is complex in theory, and in practice. According to the National Research Council's letter to the U.S. Department of Education's Race To the Top guidelines, their concerns about the validity of value-added were many. It is worth quoting extensively from the letter:
Prominent testing expert Robert Linn concluded in his workshop paper: “As with any effort to isolate causal effects from observational data when random assignment is not feasible, there are reasons to question the ability of value-added methods [VAM] to achieve the goal of determining the value added by a particular teacher, school, or educational program” (Linn, 2008, p. 3). Teachers are not assigned randomly to schools, and students are not assigned randomly to teachers. Without a way to account for important unobservable differences across students, VAM techniques fail to control fully for those differences and are therefore unable to provide objective comparisons between teachers who work with different populations. As a result, value-added scores that are attributed to a teacher or principal may be affected by other factors, such as student motivation and parental support.
VAM also raises important technical issues about test scores that are not raised by other uses of those scores. In particular, the statistical procedures assume that a one-unit difference in a test score means the same amount of learning—and the same amount of teaching—for low- \performing, average, and high-performing students. If this is not the case, then the value-added scores for teachers who work with different types of students will not be comparable. One common version of this problem occurs for students whose achievement levels are too high or too low to be measured by the available tests. For such students, the tests show “ceiling” or “floor” effects and cannot be used to provide a valid measure of growth. It is not possible to calculate valid value-added measures for teachers with students who have achievement levels that are too high or too low to be measured by the available tests.
In addition to these unresolved issues, there are a number of important practical difficulties in using value-added measures in an operational, high-stakes program to evaluate teachers and principals in a way that is fair, reliable, and valid. Those difficulties include the following:
1. Estimates of value added by a teacher can vary greatly from year to year, with many teachers moving between high and low performance categories in successive years (McCaffrey, Sass, and Lockwood, 2008).
2. Estimates of value added by a teacher may vary depending on the method used to calculate the value added, which may make it difficult to defend the choice of a particular method (e.g., Briggs, Weeks, and Wiley, 2008).
3. VAM cannot be used to evaluate educators for untested grades and subjects.
4. Most databases used to support value-added analyses still face fundamental challenges related to their ability to correctly link students with teachers by subject.
5. Students often receive instruction from multiple teachers, making it difficult to attribute learning gains to a specific teacher, even if the databases were to correctly record the contributions of all teachers.
6. There are considerable limitations to the transparency of VAM approaches for educators, parents and policy makers, among others, given the sophisticated statistical methods they employ.
In addition, there are prominent statisticians and mathematicians as well as other researchers who have clearly cautioned against using "value-added" for more than a small portion of teacher or administrator evaluations. The confidence level of those who have examined value-added in practice is not high. We note here that we are puzzled by a system of evaluation being promoted for teachers using "value-added" methodology while the State of Louisiana is still using student scores as the largest component of School Performance Scores (SPS) while in no way accounting for the very differences in students that led them to adopt "value-added" in evaluating individual teachers. To just point out one obvious problem with using student scores for SPS purposes, just note that the top twenty schools in the state use selective enrollment, giving those schools an obvious advantage. Also note that the top district in the state has the lowest percentage of students qualifying for free meals, while the lowest performing district in the state has the highest level of low-income students? A value-added system would not evaluate schools this way.
A second layer of complexity and problematic decisions appears not to have been anticipated by the authors of the report. This layer concerns the practical limitations and questions concerning the details of this tactic. Is the intent to use only student achievement? The language of the tactic suggests this. The instability of year-to-year measurements highlighted by the American Research Council suggests that using a two to three year measure might be useful to avoid single-year errors, yet many new teachers, including Teach For America teachers, are not there long enough to receive scores. In addition, EBR could not likely replace 25% of the teachers and administrators in any given year, especially in hard to fill areas.
The authors of this tactic apparently have not read any of the recent research outlined by Dan Pink that is shaking up historical understandings of motivation. Using "carrots and sticks," as this tactic does flies in the face of more contemporary understandings of the serious problems in using contingent rewards. There is, at best, limited success at using bonuses in teaching. For example, one of the most often cited programs using "pay-for-performance," an incentive program for teachers in New York City, was dropped in the summer of 2011 after research on the program made clear that it was not working! Here it how it was reported:
Weighing surveys, interviews and statistics, the study found that the bonus program had no effect on students’ test scores, on grades on the city’s controversial A to F school report cards, or on the way teachers did their jobs.
“We did not find improvements in student achievement at any of the grade levels,” said Julie A. Marsh, the report’s lead researcher and a visiting professor at the University of Southern California. “A lot of the principals and teachers saw the bonuses as a recognition and reward, as icing on the cake. But it’s not necessarily something that motivated them to change.”
The results add to a growing body of evidence nationally that so-called pay-for-performance bonuses for teachers that consist only of financial incentives have no effect on student achievement, the researchers wrote. Even so, federal education policy champions the concept, and spending on performance-based pay for teachers grew to $439 million nationally last year from $99 million in 2006, the study said.
According to a research report from the London School of Economics highlighted in Dan Pink's TED Talk on the subject, "financial incentives…can result in a negative impact on overall performance." The title of a research paper by David Marsden that the London School of Economics has published says it all: "The paradox of performance related pay systems: ‘why do we keep adopting them in the face of evidence that they fail to motivate?'"
Finally, the financial implications of the combination of monetary rewards for the top performing teachers (as measured by a questionable value-added system) and the costs of recruiting and training for a stream of new teachers replacing teachers determined by the same value-added system to be low performing could easily overwhelm the budget of the East Baton Rouge Parish School System, which is already operating with a rapidly declining budget balance.
While we recognize that there are some strategies and tactics which, if implemented with fidelity will likely improve our schools and the experiences of those attending them, we, the undersigned, are very concerned about the lack of quality apparent in far too many of these tactics, and believe that there needs to be more transparency in the conversations about these recommended actions before the Board takes any action to set them into policy. We are more than willing to assist, and offer as a starting point that all recommendations include references to the origins of, and research behind, each of them.
The undersigned urge the East Baton Rouge Parish School Board to carefully examine the recommendations of this report, and to examine and have provided to the public the research base supporting or calling into question each of them.
(Note: There are others involved in this response, but only the three of us have approved this version so far.
Noel Hammatt, Independent Education Researcher, Baton Rouge
John St. Julien, Ph.D., Retired Professor of Education, Lafayette
Donald Whittinghill, Education Consultant, Baton Rouge