How might protocols and program fidelity of implementation improve instruction? (Updated July 2019)

This next segment is the original, broader document and includes earlier periods.

		How might protocols and program fidelity of implementation improve instruction? (Updated July 2019)
Dr Kerry Hempenstall, Senior Industry Fellow, School of Education, RMIT University, Melbourne, Australia.
This article can be downloaded as a PDF file at https://tinyurl.com/y6vat4ut

New Addition - March 2025

OK, let us make an attempt to make clear what these terms protocols and program fidelity might mean, and how they may be of use in our education settings. 
I’ve drawn on this first section below as it is a helpful start presented from the original document found below this new section:
_______________________________________________________________________________________________ 
“Here, then, is our situation at the start of the twenty−first century: We have accumulated stupendous know−how. We have put it in the hands of some of the most highly trained, highly skilled, and hardworking people in our society. And with it, they have indeed accomplished extraordinary things. Nonetheless, that know-how is often unmanageable. Avoidable failures are common and persistent, not to mention demoralizing and frustrating, across many fields—from medicine to finance, business to government. And the reason is increasingly evident: the volume and complexity of what we know has exceeded our individual ability to deliver its benefits correctly, safely, or reliably. Knowledge has both saved us and burdened us.
That means we need a different strategy for overcoming failure, one that builds on experience and takes advantage of the knowledge people have but somehow also makes up for our inevitable human inadequacies. And there is such a strategy—though it will seem almost ridiculous in its simplicity, maybe even crazy to those of us who have spent years carefully developing ever more advanced skills and technologies. It is a checklist.” (Gawande, 2010).
What are protocols? There are several different meanings to be found in a dictionary. However, in this context a protocol refers to an accepted way of doing something. A procedure’s acceptance should be based upon the best available evidence that the specified approach is the best choice in a given situation. For example, airlines flight operations manuals that provide the appropriate crew actions for almost any situation. They were developed over time as it became evident that even highly skilled crew make errors, that can be diminished by use of, for example, a pre-flight checklist.
Protocols have been enthusiastically adopted by a variety of professions and industries because they reduce inefficiency caused by variation in effective practice among members. In industry, this has positive economic outcomes and has become standard practice in agriculture, transport, inventory management, production processes. Health and safety protocols have been introduced in aircraft cockpits, fire and other emergency services, and hospitals and other health care facilities - because they also save lives. In medicine and clinical psychology, evidence-based practice is increasingly the norm and students-in-training are inculcated into the value of protocols to improve decision making and practice.”
This has presented information for some of the definition, and here below is another explanation.
“In teaching, protocols and program fidelity, which refers to implementing programs as intended, can help by ensuring consistent and effective instruction, leading to better student outcomes and facilitating the replication of successful interventions.”
Implementation Fidelity (2019)
“Poor program implementation constitutes one explanation for null results in trials of educational interventions. For this reason, researchers often collect data about implementation fidelity when conducting such trials. In this article, we document whether and how researchers report and measure program fidelity in recent cluster-randomized trials. We then create two measures—one describing the level of fidelity reported by authors and another describing whether the study reports null results—and examine the correspondence between the two. We also explore whether fidelity is influenced by study size, type of fidelity measured and reported, and features of the intervention. We find that as expected, fidelity level relates to student outcomes; we also find that the presence of new curriculum materials positively predicts fidelity level.”
Hill, H.C., & Erikson, A. (2019). Using Implementation Fidelity to Aid in Interpreting Program Impacts: A Brief Review. Educational Researcher, 48(9), 590-598
DOI:10.3102/0013189X19891436
https://www.researchgate.net/publication/337988202_Using_Implementation_Fidelity_to_Aid_in_Interpreting_Program_Impacts_A_Brief_Review
_______________________________________________________________________________________________ 

Fidelity to a Program Requires Teachers Who Can Adapt (2024)

“When planning any lesson—yes, teachers still must plan lessons even with a program—it's critical to keep in mind what students know and are able to do, as well as to consider how much support (in terms of skills, background knowledge, and vocabulary) they will need to meet the lesson objectives. Teaching lessons exactly as written without considering students' strengths and needs may cause confusion, frustration, boredom, or disengagement.
Utilize small group instruction. Identify underlying skills or knowledge that students need to actively participate in the lesson, group students according to need(s), and provide a short, focused lesson to pre-teach. For students ready for a challenge, group them and teach them a strategy to help them think about the lesson content with greater depth or complexity. While you're teaching small groups, the rest of the class could be engaged in independent activities. 
Vary feedback, coaching, and prompting. If constraints require you to teach a lesson to the whole class as recommended, you can vary the prompts, demonstrations, and tasks to either provide more scaffolding or increase challenge. For example, if the lesson script suggests that students summarize, you could challenge some students to summarize focused on an angle or idea rather than just the sequence of information. If others need support with their summary, you might be ready with prompts such as, "What was the most important information from this paragraph? Can you use the subheading to help you?"
Slow down (or speed up). Many programs contain more materials than you could ever realistically use in a school day, while others are missing critical evidence-aligned components. You can vary the pacing within a lesson—skipping over repetition if it seems that students are getting it, asking students to quickly read a section silently rather than your reading it aloud to them (or vice versa if they need more support), deleting some of the tasks that you know would be too easy, and so on. You might also need to vary the pacing across lessons—for example, adding in an extra whole class lesson to provide extra background information or practice with an easier text, or skipping a lesson that you know all of your students will find too easy.   
VARY LESSON STRUCTURES TO PROVIDE DIFFERENT LEVELS OF SUPPORT
I think one of the best things teachers can do is to have a small repertoire of lesson types to choose from, like tools in a toolbox. Each lesson type offers different opportunities for engagement, has different levels of support, and follows a different structure. You can select a type based upon student needs, lesson purpose, text type, etc.
Choose a lesson structure that offers greater scaffolding. Imagine that your program suggests you assign a text that you think your students will find challenging or confusing. You agree that it's an important text to teach, but you're concerned that your students won't understand the text if they read in pairs while you circulate, as the program suggests. In this case, perhaps you teach a read-aloud, shared-reading, or close-reading lesson.
Choose a lesson structure to boost engagement and independence. As you look at your program's lessons, do you notice a lot of read-aloud lessons and assigned independent reading? Maybe some independent reading could be recast as readers' theater lessons to offer students an engaging opportunity to improve their fluency. Or maybe a read-aloud lesson could be extended to include explicit teaching into discussion, through conversation lessons.
SWAP TEXTS
Many literacy programs exclusively use texts that are written at grade level, and if your students aren't yet able to read those texts with independence, you'll need strategies to support them. Further, you'll need to consider texts' content in relation to your students' interests, identities, and knowledge, and plan to support or swap texts accordingly.
(Note: All students should have regular opportunities to read grade-level (and above) texts with appropriate scaffolding and support.)
Create conceptually coherent text sets. If you know the provided text is currently too difficult for your students, look for more easily accessible texts to introduce the content and key vocabulary, then move them into the program-provided text. Consider starting with picture books, video clips, and audio, even. You'll accomplish the program objectives and will likely find that your students have a much deeper understanding of the content, too.
Swap out texts to ensure cultural relevance. Texts you use should be inclusive, relevant, identity affirming, and culturally responsive and sustaining. Be aware of texts that misrepresent or leave out important cultural perspectives, or are culturally destructive, and make changes. Choose texts that will help you cover the same content and meet the program's learning objectives, but which include accurate history, multiple perspectives, and full characters, and offer opportunities for critical thinking.
Trying for strict fidelity to a program isn't only unrealistic—it can actually hinder effective teaching and student progress. You can maintain fidelity to a program's objectives and scope-and-sequence while making necessary adjustments to be effective, meet your students' needs and interests, and support their engagement. And leaders should trust teachers—practitioners who have insights and knowledge about their students, as well as pedagogical expertise—to do so.    By making informed instructional decisions, teachers can maintain fidelity to their curriculum while supporting and engaging their students.”
Serrravallo, J. (2024). Fidelity to a Program Requires Teachers Who Can Adapt. Edutopia George Lucas Educational Foundation. https://www.edutopia.org/article/fidelity-to-a-program-requires-teachers-who-can-adapt/
_______________________________________________________________________________________________ 

Supporting students significantly behind in literacy and numeracy (2023)
“Who should implement interventions? There were no explicit and consistent findings regarding the most effective implementer of intervention. No review included the question of who should implement interventions investigated directly. However, the relative impact of teachers implementing interventions compared to researchers implementing interventions was considered in some of the systematic reviews.
There was also evidence regarding the fidelity of intervention implementation. The fidelity of implementation is closely related to the question of who should implement interventions. The concept of implementation fidelity refers to the degree to which an intervention is consistently delivered and requires adherence to the protocols of the program.
The protocols must incorporate processes of delivery that include implementer and participants, as well as the evaluation of the program, to ensure that the intervention has been delivered as intended. The findings from the systematic reviews regarding the implementers of intervention and the fidelity of interventions are outlined below.
There was mixed evidence regarding whether literacy interventions delivered by researchers were more effective than those delivered by teachers. Some studies indicated that this was the case. For example, in one systematic review (Kim et al. 2012), it was found that the effects of interventions were consistently stronger when delivered by a researcher (g = 0.86 to 3.14), relative to instruction delivered by a teacher (g = 0.07 to 1.14).
Similarly, another group of reviewers of a large set of reading studies (Scammacca et al. 2007) reported that for all outcome measures, researcher-led implementation of interventions resulted in higher effect sizes (g = 1.70) than teacher-led implementation (g = 0.63). Jitendra et al. (2018) also found that researcher-led implementation of mathematics interventions yielded a larger effect size (g = 0.70) than interventions delivered by school personnel (g = 0.35).
Conversely, other reviewers reported minimal differences in literacy intervention effects delivered by researchers or teachers. For example, in one systematic review, it was found that the differential mean of the teacher effect was not significantly different from the researcher effect (d = 0.37 and d = 0.29 respectively; Goodwin and Ahn 2013). Similarly, other reviewers (Berkeley et al. 2010) reported no significant difference between reading intervention effects delivered by researchers (g = 0.83) or teachers (g = 0.56). Other reviewers (Kim et al. 2004) found that it did not matter who implemented the intervention using graphic organisers, the intervention itself generated large effect sizes. Inconsistent results were found for the effects of mathematics intervention implementers.
However, in one systematic review, greater intervention effectiveness was noted when the intervention was jointly implemented by teachers and researchers (Xin and Jitendra 1999). In general, it was found that for studies in which there was adherence to quality indicators, such as fidelity of implementation, instruction and assessment, as well as professional development and maintenance, the effect sizes of the interventions were robust (Scammacca et al. 2007).
Reference to fidelity was made in the study designs of 50 systematic reviews of interventions (72%), while in only 44 systematic reviews (64%) were fidelity data provided. In 15 systematic reviews of interventions, neither fidelity nor fidelity data were reported. In 5 of these 15 reviews reference was made to relevant fidelity or quality indicator guidelines, such as What Works Clearinghouse (Institute of Education Sciences).
However, most of the authors of the systematic reviews who cited fidelity in the intervention process emphasised the need for future researchers to ensure fidelity of implementation at the forefront of their study designs. While there were no clear findings on who is in the best position to implement interventions, there were indication; it was noted that both daily and monthly feedback sessions were impactful in this way.
Conclusion
Despite the gaps identified in the research base, and the dated systematic reviews in some areas, there is sound evidence to support the introduction of MTSS in Australian school systems.
The evidence suggests that with consistent use of effective instruction at Tier 1, a team-based problem-solving approach to selecting evidence-based intervention and implementing these with fidelity at Tier 2, 95% of students could meet academic benchmarks. This would reserve intensive Tier 3 support only for a small proportion of students in need and reduce the number of special education placements, as was found in the US.
Our findings point to the importance of consistency across jurisdictions to ensure system-wide scaling of MTSS is successful. They also emphasise the importance of ensuring the impact of instruction and intervention for underachieving students is maximised through robust CBM and professional learning.
Our findings further highlight the effectiveness of developing and resourcing technical assistance centres, as well as using materials developed by those exemplary states in the US that are regarded as the benchmark for quality MTSS.
Clear evidence is presented to indicate how secondary schools can best intervene to close achievement gaps in reading, writing and mathematics. The effective practices that were identified can be considered an excellent starting point for use in secondary schools seeking to support these students. In addition to upskilling secondary school staff in using screening data well, we recommend professional learning for teachers in effective practices for reading, writing and mathematics interventions, such as: • explicit instruction • strategy instruction • using graphic organisers.
We recommend that this investment be made schoolwide, rather than being reserved for the implementation of interventions. Instruction at Tiers 2 and 3 of support should not be fundamentally different from instruction at Tier 1.
The effective practices we identified for reading, writing and mathematics interventions are not special educational practices; they are quality teaching practices and offer benefit in all classrooms for all students, as well as benefiting those needing more support.
The MTSS framework is a general education initiative. It is reliant on strong, regular classroom instruction at Tier 1 for all students as the most important foundation, and this should be sufficient for most students to succeed at school. If MTSS were implemented at scale across schools, our research suggests this would minimise the number of students reaching secondary school needing support. At the present time, too many students arrive in secondary school substantially behind their peers, and too many of these are students who are already disadvantaged.
Our review offers clear advice about how this can be addressed and shows that it is never too late to teach these students. To achieve this, effective intervention selected by collaborative teams and implemented by appropriately skilled staff is needed. Students should be identified as quickly as possible, using data from effective screening measures.
These data should be examined by multidisciplinary teams and a problem-solving approach adopted to select evidence-based interventions aligned to the identified skills needing improvement. This will help the most vulnerable and disadvantaged students to be as well-prepared as possible to flourish in life beyond school, and make our schools places where every student can thrive.”
de Bruin K, Kestel E, Francis M, Forgasz H and Fries R (2023). Supporting students significantly behind in literacy and numeracy: A review of evidence-based approaches, edresearch.edu.au. https://www.edresearch.edu.au/sites/default/files/2023-05/aero-supporting-students-significantly-behind-literacy-numeracy.pdf
_______________________________________________________________________________________________ 
A bit of a summary to finish: AI Overview
“Protocols and program fidelity of implementation improve instruction by ensuring consistent and effective delivery of evidence-based practices, leading to more predictable and positive student outcomes. 
Here's a more detailed explanation:

What is Fidelity of Implementation? 

Fidelity of implementation refers to the degree to which a program or intervention is delivered as intended by its developers. 

It's about ensuring that key components and strategies are implemented consistently and accurately. 

High fidelity is associated with improved student outcomes. 

Why is Fidelity Important? 

Predictability and Consistency: When programs are implemented with high fidelity, educators can expect more consistent and predictable results. 

Attributing Outcomes: It allows researchers and practitioners to confidently attribute observed outcomes to the intervention itself, rather than variations in implementation. 

Preventing False Conclusions: Fidelity assessment helps prevent potentially false conclusions about an intervention's effectiveness. 

How Protocols and Fidelity Improve Instruction: 

Clear Guidelines: Protocols provide clear guidelines and procedures for implementing programs, ensuring that educators follow the intended approach. 

Training and Support: Fidelity often involves providing educators with adequate training and ongoing support to ensure they can implement programs effectively. 

Monitoring and Feedback: Fidelity monitoring involves regularly assessing how programs are being implemented and providing feedback to educators to improve their practice. 

Focus on Critical Components: Protocols and fidelity efforts can help educators focus on the critical components of a program that are most likely to drive positive outcomes. 

Examples of Protocols and Fidelity in Action: 

Curriculum Implementation: Schools can establish protocols for implementing a new curriculum, including training, resources, and ongoing support for teachers. 

Intervention Programs: Protocols can be used to ensure that evidence-based interventions are implemented with fidelity, such as in reading or math instruction. 

Coaching and Feedback: Coaching and feedback sessions can help educators refine their implementation skills and ensure they are adhering to program protocols. 

Factors Influencing Fidelity: 

Program Design: The design of the program itself can influence fidelity, with well-designed programs being easier to implement consistently. 

Organizational Support: Schools and districts that provide strong support for implementation are more likely to achieve high fidelity. 

Teacher Training and Support: Adequate training and ongoing support are crucial for educators to implement programs with fidelity.”

Generative AI
Finished – hope it is understandable!
_______________________________________________________________________________________________ 

This next segment is the original, broader document and includes earlier periods.

“Here, then, is our situation at the start of the twenty−first century: We have accumulated stupendous know−how. We have put it in the hands of some of the most highly trained, highly skilled, and hardworking people in our society. And with it, they have indeed accomplished extraordinary things. Nonetheless, that know-how is often unmanageable. Avoidable failures are common and persistent, not to mention demoralizing and frustrating, across many fields—from medicine to finance, business to government. And the reason is increasingly evident: the volume and complexity of what we know has exceeded our individual ability to deliver its benefits correctly, safely, or reliably. Knowledge has both saved us and burdened us.
That means we need a different strategy for overcoming failure, one that builds on experience and takes advantage of the knowledge people have but somehow also makes up for our inevitable human inadequacies. And there is such a strategy—though it will seem almost ridiculous in its simplicity, maybe even crazy to those of us who have spent years carefully developing ever more advanced skills and technologies. It is a checklist.” (Gawande, 2010).
What are protocols? There are several different meanings to be found in a dictionary. However, in this context a protocol refers to an accepted way of doing something. A procedure’s acceptance should be based upon the best available evidence that the specified approach is the best choice in a given situation. For example, airlines flight operations manuals that provide the appropriate crew actions for almost any situation. They were developed over time as it became evident that even highly skilled crew make errors, that can be diminished by use of, for example, a pre-flight checklist.
Protocols have been enthusiastically adopted by a variety of professions and industries because they reduce inefficiency caused by variation in effective practice among members. In industry, this has positive economic outcomes and has become standard practice in agriculture, transport, inventory management, production processes. Health and safety protocols have been introduced in aircraft cockpits, fire and other emergency services, and hospitals and other health care facilities - because they also save lives. In medicine and clinical psychology, evidence-based practice is increasingly the norm and students-in-training are inculcated into the value of protocols to improve decision making and practice.

Teaching is more than showing, and effectiveness requires more than simply content knowledge
An educational protocol is a set of predetermined curriculum content and teaching methods that define evidence-based practices known to be effective across a range of educational settings and learner characteristics. Protocols have not been widely adopted in education as yet for a variety of reasons. The primary reason is that evidence has not played a major role in educational decision making either at the policy level or at the classroom level, and this has led to great variation in teacher practices and instructional quality across our classrooms.
Some have argued that science has little to offer education, and that teacher initiative, creativity, and intuition provide the best means of meeting the needs of students. For example, Weaver considers scientific research offers little of value to education (Weaver et al., 1997). “It seems futile to try to demonstrate superiority of one teaching method over another by empirical research” (Weaver, 1988, p.220). These writers often emphasise the uniqueness of every child as an argument against instructional designs that presume there is sufficient commonality among children to enable group instruction with the same materials and techniques. Others have argued that teaching itself is ineffectual when compared with the impact of socioeconomic status and social disadvantage (Coleman et al., 1966; Jencks et al., 1972). Smith (1992) argued that it is the relationship between a teacher and a child, not instructional methods, that is the major determinant of learning. Thus, he downplayed instruction in favour of a naturalist perspective “Learning is continuous, spontaneous, and effortless, requiring no particular attention, conscious motivation, or specific reinforcement” (p.432). Still others view educational research as reductionist, dissecting teaching into little segments, and unable to encompass the necessarily wholistic nature of the learning process (Cimbricz, 2002; Poplin, 1988).
Overall, these influential beliefs have led to a focus on student responsibility for learning outcomes, and the teacher role reduced to offering only minimally-guided instruction – the guide-on-the-side. In the reading domain, this translated into the whole language model in which beginning readers were encouraged to guess their way to competence, employing their own unique reading style.
In contrast, to the focus upon student uniqueness has been research showing that young readers are remarkably similar in their response to effective instruction. Brain studies have indicated there are no brain areas that have evolved as reading centres, as opposed to speech and language that do occupy specific brain locations. When readers become skilled however, they employ the same specific areas – notably the left occipito-temporal area – an area that has evolved for other purposes. This adaptation of a brain region that has actually evolved for perceiving the junctions between straight lines and curves is known as neuronal recycling. Dehaene (2009) makes a vital point “Every child is unique…but when it comes to reading, all have roughly the same brain that imposes the same constraints and the same learning sequence. Thus, we cannot avoid a careful examination of the conclusions – not prescriptions – that cognitive neuroscience can bring to the field of education” (p. 218).

Potential benefits of protocols and program fidelity

An example from clinical psychology:
During the 1990’s the American Psychological Association (Chambless & Ollendick, 2001) introduced the term empirically supported treatments (EST) as a means of highlighting differential psychotherapy effectiveness. Prior to that time, many psychologists saw themselves as developing a craft in which competence arises through a combination of initial training, personal qualities, intuition, and experience. The result was extreme variability of effect among practitioners.
The idea behind EST was to devise a means of rating therapies for various psychological problems, and for practitioners to use these ratings as a guide to their practice. The criteria for a treatment to be considered well established included efficacy through two controlled clinical outcomes studies or a large series of controlled single case design studies, the availability of treatment manuals to ensure treatment fidelity, and the provision of clearly specified client characteristics. A second level involved criteria for probably efficacious treatments. These criteria required fewer studies, and/or a lesser standard of rigor. The third category comprised experimental treatments, those without sufficient evidence to achieve probably efficacious status. Treatments are manualized, and fidelity to the manual is assessed using videotaped sessions. Clients are seen for a fixed number of sessions, and the target outcomes are well operationalized. This shift from eclecticism to evidence-based practice saw a great improvement in outcomes of therapy, and proved itself to be repeatable, as new generations of therapists-in-training were able to produce similar outcomes to their experienced colleagues when both implemented the same intervention.
There are some obvious parallels with educational programs. In education programs, however, there is rarely the level of detail that is routinely specified in a psychotherapy treatment manual.
Education program designers usually assume that teachers know how to structure a lesson effectively, if they are provided with some worthwhile content. This assumption is far from universally justified. The content may be research-based, but its presentation may be competent, slipshod, or cursory; corrective feedback may or may not occur systematically; mastery by students may or may not be expected; and massed and spaced practice opportunities may or may not be adequate. Regular data-based monitoring may or may not occur. Teacher creativity may abound. For example, the notion of teaching according to every student’s learning style continues to be influential in schools. This loose coupling between content and delivery would horrify an empirically-trained psychologist, as it would a surgeon, a pilot, a paramedic. It also highlights why the crucial element in evaluation is not simply that a program is consistent with scientific findings, but also that it has been demonstrably successful with the target population.
“There’s too much ‘choose your own adventure’ in Australian education” deputy chairman of the Australian Institute of Teaching and School Leadership John Fleming said. “There’s a lot unsubstantiated by the research that people still are allowed to espouse about the best way to teach kids. There’s too many broad guidelines people can interpret in many different ways.” (Ferrari, 2014)
It is for this reason that some programs provide a high degree of specificity as to how the program is to be implemented. In some cases (e.g., Direct Instruction), this involves scripted lessons – a means of increasing the likelihood that the program that has been shown to be effective in empirical trials is replicated in the classroom. This is rather similar to protocols carefully followed by surgeons, pilots, and disaster management professionals. It is often described as program fidelity or treatment integrity.

So what is program fidelity?

There are several synonyms for program fidelity, such as treatment integrity, and instructional fidelity
“High fidelity implementation means that you get a program with an internal design and follow that design. That would include using the materials in a particular sequence, adhering to the amount of time and practice called for by the program and following the recommendations for grouping or re-teaching students. It would mean using of all the essential components as they are designed, including differentiated instructional time and program assessments. … Teachers are concerned that high fidelity implementation will suck the life out of their teaching and that all classrooms will look exactly alike. And that’s a valid concern. But the creativity comes in by becoming an astute diagnostician and figuring out what your kids need more and less of, and how to adjust each lesson to meet each child’s needs. And teachers always do add their own style and flair to it. The important idea, however, is that we have a common metric. The system will run more effectively if we all have common research based teaching practices and can learn together about how to make them work for all students. Having a common curriculum allows a whole system to improve, not just selected classrooms. It also makes delivery of professional development more efficient.”
Diamond, L. (2004). High fidelity — It’s all about instructional materials. An interview with Linda Diamond of CORE. Retrieved from  https://www.corelearn.com/files/HighFidelity.pdf

“ … the extent to which essential intervention components are delivered in a comprehensive and consistent manner by an interventionist trained to deliver the intervention” (Hagermoser, Sanetti, & Kratochwill, 2009, p. 448).
Hagermoser Sanetti, L. M., & Kratochwill, T. R. (2009). Toward developing a science of treatment integrity: Introduction to the special series. School Psychology Review, 38, 445–459.

Fidelity in research settings requires fidelity checks to be incorporated into the research design.
The need for such fidelity has only been recognised in research over recent years, partly due to the rise of the Response to Intervention approach to assessing and intervening students who struggle in educational settings.
“Gresham and Kendall (1987) reviewed consultation studies published prior to 1987 and found that no study reviewed included treatment integrity data. Gresham (1989) reflected on his previous finding and concluded that most studies relied on the "consult and hope" (p. 48) approach, described as the act of consultation services occurring, but with no follow-up to ensure that teachers performed the treatment as prescribed.
Solomon, B.G, Klein, S.A, & Politylo, B.C. (2012). The effect of performance feedback on teachers' treatment integrity: A meta-analysis of the single-case literature. School Psychology Review, 41(2), 160-176).

In research these days, it is necessary to demonstrate that any changes in the dependent variable (outcomes) are caused by the changes in the independent variable (intervention) under investigation. This can only be ascertained if the researchers can show that the intervention was implemented precisely as it was designed. Hence fidelity measures necessarily form part of the research studies. However, in school implementations, less attention has been devoted to ensuring interventions are conducted with fidelity. The risk to outcomes is further amplified when progress monitoring and end-of-program evaluation is not conducted.
“In the medical sphere there are well-established protocols that need to be adhered to prior to the introduction of any new drug or treatment. No such protocols apply in education, an area in which lives are also at stake (Dinham, 2014b)” (p.14).
Dinham, S. (2015). The worst of both worlds: How the U.S. and U.K. are influencing education in Australia. Education Policy Analysis Archives, 23(49). Retrieved from http://dx.doi.org/10.14507/epaa.v23.1865

“Fidelity of implementation is traditionally defined as the extent to which the intervention is implemented as designed during an experimental study (e.g., Hord, Rutherford, Huling- Austin, & Hall, 1987; National Research Council, 2004). Dane and Schneider (1998) reported that there are five criteria for measuring fidelity of implementation. These criteria include (a) adherence, whether the components of the program are being delivered as designed; (b) exposure, the number, length, or frequency of sessions being implemented; (c) quality of delivery, the manner in which the implementer delivers the program using the prescribed methods and techniques; (d) responsiveness, the extent to which the participants are engaged by and involved in the activities and content of the program; and (e) program differentiation, whether critical features that distinguish the program from the comparison condition are present or absent during implementation (O’Donnell, 2008). More specifically, fidelity of implementation can be differentiated into two primary categories: (a) fidelity of structure (i.e., adherence and exposure) and (b) fidelity of process (i.e., program differentiation, quality of delivery, and responsiveness; Dane & Schneider, 1998; Mowbray, Holter, Teague, & Bybee, 2003)” (p.79-80).
“Our analyses revealed that overall fidelity of implementation accounted for 22% of the variance in the gains in basic reading skills and 18% of the passage comprehension gains of middle school students with reading difficulties” (p.85). … The findings of the present study are consistent with previous research that has demonstrated that fidelity of implementation has statistically and educationally significant effects on student outcomes (Allinder et al., 2000; Hall & Loucks, 1977; Penuel & Means, 2004; Songer & Gotwals, 2005; Ysseldyke et al., 2003)”. … Our findings have implications for the challenge of moving effective approaches to practice, particularly those designed to close the reading achievement gap. Cook, Landrum, Tankersley, and Kauffman (2003) highlight that approaches may be rendered ineffective or counterproductive if not used with adequate dosage (amount of treatment) or when implemented without adequate fidelity. Placing this concern in the context of the present investigation, teachers implementing Corrective Reading Decoding (an evidence-based remedial reading intervention) with low fidelity did not experience large reading improvements commensurate with their colleagues implementing with high fidelity. Thus, these teachers may feel justified in concluding that Corrective Reading Decoding is not effective in building the reading skills of striving middle school readers. Limited or no consideration to fidelity of intervention is a large threat to internal validity; without consideration of level of fidelity, it is difficult to ascertain whether the intervention was responsible for enhanced or constrained treatment outcomes. We draw on and concur with the more than 30-year-old findings of Hall and Loucks (1977), who found that those implementing an innovation vary on adherence to the structure and quality of implementation of the innovation. Our findings underscore that reading outcomes appear to significantly vary according to how well the intervention was delivered and the degree to which the structure of lessons was followed”. (p.86).
Benner, G.J., Nelson, J.R., Stage, S.A., & Ralston, N.C. (2011). The influence of fidelity of implementation on the reading outcomes of middle school students experiencing reading difficulties. Remedial and Special Education, 32, 79–88.

“In intervention research, treatment fidelity is defined as the strategies that monitor and enhance the accuracy and consistency of an intervention to ensure it is implemented as planned and that each component is delivered in a comparable manner to all study participants over time. Reviews of the literature in special education and other disciplines reveal that reports of treatment fidelity are limited” (p.121).
Smith, S.W., Daunic, A.P., & Taylor, G.G. (2007). Treatment fidelity in applied educational research: Expanding the adoption and application of measures to ensure evidence-based practice. Education & Treatment of Children 30(4), 121-134.

“Implementation refers to the process by which an intervention is put into practice. Research studies across multiple disciplines, including education, have consistently demonstrated that interventions are rarely implemented as designed and, crucially, that variability in implementation is related to variability in the achievement of expected outcomes.” (p. 635)
Lendrum, A., & Humphrey, N. (2012). The importance of studying the implementation of interventions in school settings. Oxford Review of Education, 38(5), 635-652.

 “A common approach to RTI is to provide multiple tiers of increasingly intensive interventions in which students are provided with standardized, researchbased interventions (Fuchs, Mock, Morgan, & Young, 2003). Because students are provided with highly standardized interventions, only a subset of students with intractable reading difficulties or disabilities, who fail to respond to multiple tiers of research-based, standardized interventions, will require specialized instruction. Fidelity of implementation, or implementing a standardized, research-based intervention as designed and intended, is critical to ensure that students who have been identified as responding inadequately are true inadequate responders. Without high fidelity, it is unclear what the effects on the students will be (Hill, King, Lemons, & Partanen, 2012).” (p.192)
Austin, C.R., Vaughn, S., & McClelland, A.M. (2017). Intensive reading interventions for inadequate responders in Grades K–3: A synthesis. Learning Disability Quarterly, 40(4), 191-210. doi.org/10.1177/07319487177144

“In duplicating a research study, administrators need to read carefully the details of the program they are hoping to implement. Dynarski (2010) states that "knowledge drawn from science doesn't come with instructions on how to put it into practice" (p. 61). The administrator as researcher will need to ask many questions: How was the intervention implemented? Who administered it? Under what conditions? For what duration of time? Principals sometimes "tweak" programs to suit their particular school setting, but a lack of implementation fidelity or a failure to adhere to the study's implementation protocol may affect the outcomes. Significantly higher outcomes are achieved when programs are implemented as intended by the developer (O'Donnell, 2008)” (p. 124).
Bair, M. A., & Enomoto, E. K. (2013). Demystifying research: What's necessary and why administrators need to understand it. National Association of Secondary School Principals. NASSP Bulletin, 97(2), 124-138.

“Increasing the likelihood of teachers implementing research-based strategies in authentic school settings is a major goal of education leaders. Likewise, decreasing the variability of instruction practices and increasing fidelity of implementation to models of instruction and intervention is particularly difficult (Gersten, Chard, & Baker, 2000; Gresham, MacMillan, Beebe-Frankenberger, & Bocian, 2000). To address these issues in the context of ECRI, we developed highly specified lesson plans and teaching routines to support standard implementation of instruction and intervention materials. Our goal was to increase the level of specificity to ensure that teachers provided students with explicit and, when appropriate, intensive instructional supports (i.e., in the context of both Tier 1 and Tier 2). These routines provided clear expectations to teachers for what content to cover during instruction and intervention lessons and highly specified guidance for explicit and engaging teacher- student interactions. Akin to the Checklist Manifesto (Gawande, 2009), the goal of the specified routines was to increase the degree to which practitioners implement evidence- based practices with fidelity and integrity. The approach of using highly specified instruction and intervention routines can also be used as a tool for coaches and school leaders to define and measure implementation fidelity and to provide subsequent implementation goals for teachers. It is important to note that school based personnel (rather than researchers) delivered both the Tier 1 portion and the Tier 2 portions of the model. Having school personnel as implementers, notably a unique feature of this study, increases the external validity of the study’s results. The study findings also have potential implications for publishers and developers of core reading programs and tier 2 interventions. First, in our opinion, the degree of specificity and guidance provided to teachers for delivering explicit instruction in current reading programs is lacking. Many programs do not provide enough explicit, scaffolded instruction or practice opportunities for learners at risk of reading difficulty (Gersten, 1999). Second, core program and intervention developers and publishers should strive to align instruction and intervention materials to ensure struggling students are delivered a robust and coherent tiered support plan.” (p.617)
Fien, H., Smith, J. L. M., Smolkowski, K., Baker, S. K., Nelson, N. J., & Chaparro, E. A. (2015). An examination of the efficacy of a multitiered intervention on early reading outcomes for first grade students at risk for reading difficulties. Journal of Learning Disabilities, 48(6), 602–621.

“Skills-based instruction here means instruction reflecting an intent to strengthen academic skills (e.g., letter-sound correspondence and math problem solving) and to enhance knowledge in areas such as social studies and science. We also use the term to signify an approach inspired by Direct Instruction (DI; e.g., Becker, Engelmann, Carnine, & Rhine, 1981). According to Gersten, Woodward, and Darch (1986), the key to DI is that "materials and teacher presentation of [these] materials must be clear and unambiguous" (p. 18), "much more detailed and precisely crafted" (p. 19) than the norm, for successful use with students with academic challenges. Moreover, wrote Gersten et al. (1986), this instruction "must contain clearly articulated [learning] strategies" (p. 19): a step-bystep process involving teaching to mastery, a procedure for error correction, a deliberate progression from teacher-directed to student-directed work, systematic practice, and cumulative review (cf. Gersten et al., 1986). A belief in the efficacy of skills-based instruction seems well founded. When implemented with fidelity, carefully scripted programs in reading, writing, and math - often involving learning strategies similar to DI - have been shown to benefit numerous at-risk students (e.g., Graham ÒC Perin, 2007; Kroesbergen & Van Luit, 2003; Stuebing, Barth, Cirino, Francis, & Fletcher, 2008)” (p.263).
Kearns, D. M., & Fuchs, D. (2013). Does cognitively focused instruction improve the academic performance of low-achieving students? Exceptional Children, 79(3), 263-290.

“The evidence base reviewed above, of the characteristics of effective Wave 2 intervention programmes for early and more persisting word reading difficulties, suggests more research is needed to better understand the role of: (a) instructional intensity (length of intervention, hours of instruction, optimal ratios of teachers to students, reading time, etc.); (b) programme integrity/fidelity; (c) teacher ability/experience; (d) programme focus/explicitness/multidimensionality; and (e) individual student prior instructional experiences/exposure and reading abilities. The ways in which these factors, individually and together, affect treatment outcomes are just beginning to be addressed, particularly for treatment resisters (Shaywitz et al., 2008)” (p.106).
Griffiths, Y., & Stuart, M. (2013). Reviewing evidence-based practice for pupils with dyslexia and literacy difficulties. Journal of Research in Reading, 36(1), 96-116.

“Our findings from the present study have several important implications for serving students with low IQs in general and special education settings. First and foremost, students with low IQs, including those with ID and those with IQs in the borderline range (i.e., 70-80), should be provided with evidence-based reading instruction. Although it might seem unsurprising to some that these students made meaningful progress, our study provides strong empirical evidence of reading progress across several academic years with a relatively large sample of students with low IQs who participated in a randomized control trial in which the treatment was delivered by highly trained interventionists. Specifically, our data indicate what is possible for students with low IQs if they are given access to evidence-based reading instruction. The curriculum is very explicit and systematic and was delivered with fidelity, providing very consistent, explicit, and repetitive routines, focusing on key skills, and delivering clear and explicit modeling. Thus, students with low IQs do benefit from comprehensive reading programs that were designed for struggling readers and readers with LD, but progress is slower” (p. 302-3).
Allor, J. H., Mathes, P. G., Roberts, J. K., Cheatham, J. P., & Al Otaiba, S. (2014). Is scientifically based reading instruction effective for students with below-average IQs? Exceptional Children, 80(3), 287-306.

“A recent meta-review of five intervention studies reported in the United States identified seven cognitive–linguistic variables related to variation in RTI, listed from strongest to weakest predictor (see Duff, 2008 for further details): slow rapid naming (RAN), problem behaviour, poor PA, limited understanding of the alphabetic principle, weak verbal memory, IQ and demographics. Environmental factors influencing RTI potentially include quality of Wave 1 teaching, point of intervention (early or late, where ‘late’ is defined as after KS 1 in England or G2 in the United States) and programme fidelity. The careful training, implementation, supervision and monitoring which characterises research studies may not always be observed in other circumstances with detrimental effects on the outcome of the intervention (Byrne & Fielding-Barnsley, 1995; Byrne et al., 2010; see Carter & Wheldall, 2008 for further discussion of this issue). Programme content may also influence outcome when the evidence base for inclusion of that content is weak or the content and/or implementation is inappropriate for the individual’s profile of needs, due to insufficient assessment and monitoring” (p.105).
Griffiths, Y., & Stuart, M. (2013). Reviewing evidence-based practice for pupils with dyslexia and literacy difficulties. Journal of Research in Reading, 36(1), 96-116.

“In summary, drill and practice through high-quality CAI, implemented with fidelity, can be considered a useful tool in developing students’ automaticity, or fast, accurate, and effortless performance on computation, freeing working memory so that attention can be directed to the more complicated aspects of complex tasks” (6/142).
Gersten, R., Ferrini-Mundy, J., Benbow, C., Clements, D., Loveless, T., Williams, V., Arispe, I., & Banfield, M. (2008). Report of the task group on instructional practices (National Mathematics Advisory Panel). Retrieved from the U.S. Department of Education Web site: http://www.ed.gov/about/bdscomm/list/mathpanel/report/instructional-practices.pdf

 The What Works Clearinghouse (Institute of Education Sciences, 2003) identified several features of intervention research designs that improve confidence in findings from research. Three of the most significant criteria identified include (a) the use of random assignment, (b) evidence of the use of a fidelity of treatment check, and (c) the use of standardized measurements. … A fidelity of treatment check, often referred to as treatment integrity, can improve our confidence in the accuracy and consistency of an intervention’s implementation (Gresham, MacMillan, Beebe-Frankenberger, & Bocian, 2000). Data on intervention fidelity are necessary to determine whether the intervention was implemented as intended and, therefore, whether the intended intervention is responsible for the outcomes reported.
Institute of Education Sciences. (2003). What Works Clearinghouse study review standards. Retrieved from http://www.whatworks.ed.gov/.

 “Some students may also need treatment with additional treatment components (e.g., fluency training, behavioral training, or vocabulary instruction). The results also highlight the importance of conducting treatment with fidelity” (p.343).
Al Otaiba, S. (2001). Children who do not respond to early literacy instruction: A longitudinal study across kindergarten and first grade Reading Research Quarterly, 36, 344-346.

Why wouldn’t a teacher follow protocols when they are written into a program?
One reason may be that the teacher doesn’t believe that instruction is a major driver of educational attainment. This teacher may view learning as a natural outcome of a student who engages with the curriculum, and hence may focus solely upon motivational strategies rather than instructional strategies: “If I can engage the student, then good things will follow”. Other teachers may believe that they have the skill to modify, adapt, and combine elements from different programs thus enhancing their effect on student progress. Others may believe that all children learn differently, and hence a single program cannot possibly meet the unique needs of each student. They may respond that protocols represent a one-size-fits-all approach. Some may find such a level of prescription as being demeaning to their professional standing or as a means of teacher disempowerment through control-oriented instructional policies.

Some believe that an eclectic approach is best
The eclectic approach implies that effective teachers use a range of techniques and activities from various teaching curricula and methodologies. The teacher decides which to use depending on the aims of the lesson and their characterisation of the learners in the group. Many education policy documents espouse this approach as being the optimal skilled perspective.
Heward (2003) expressed concern about eclecticism thus:
“Eclecticism—using a combination of principles and methods from a variety of theories or models—is based on the realization that no single theory or model of teaching and learning is complete and error-free. It is thought that incorporating components from a number of different models will cover the gaps or deficiencies found in any single model. The logic is reasonable and, superficially, much appears to be gained by eclecticism. The problems likely to arise from unbridled eclecticism, however, outweigh its logical appeal.
First, not all theories and models are equally trustworthy and valuable. The more models represented in the eclectic mix, the more likely it is that ineffective and possibly even harmful components will be included (Maurice, 1993, 2000).
Second, teachers might not choose the most important and effective parts of each model, and might select weaker, perhaps ineffective components instead.
Third, some strategies or components of a given model may not be effective when implemented in isolation, without other elements of the model.
Fourth, elements from different models may be incompatible with one another. For example, children in a phonics-based program should practice reading with decodable text composed of previously learned letter-sound relationships and a limited number of sight words that have been systematically taught (Grossen, 2000). Using the less decodable and often predictable text typical of some language models limits the beginning reader’s opportunity to integrate phonological skills with actual reading and encourages the use of prediction and context to comprehend a passage. Although prediction is a useful skill, children who must rely on the predictability of text will not become successful readers (Chard & Kame’enui, 2000).
Fifth, an eclectic mix might prevent any of the included models from being implemented continuously or intensely enough to obtain significant effects. A little bit of everything and a lot of nothing often reduces eclecticism to a recipe for failure (Kauffman, 1997).
Sixth, teachers who use elements of multiple models may not learn to implement any of the models with the fidelity and precision necessary for best results. The eclectic practitioner is likely to be an apprentice of many models but master of none” (p. 196).
Heward, W. L. (2003). Ten faulty notions about teaching and learning that hinder the effectiveness of special education. The Journal of Special Education, 36, 186–205. Retrieved from http://www.updc.org/assets/files/professional_development/uspin/RS-2012-Heward-Artlce.pdf.

Teacher resistance
The resistance of some teachers to prescribed evidence-based curricula has been characterised in various ways. For example,
“Research has typically reduced teacher resistance to a psychological deficit in the "resistor," who is characterized as being unwilling to change (Gitlin & Margonis, 1995; Moore, Goodson, & Hargreaves, in press) and resisting policies and programs that attempt to improve education by controlling their instructional practices (Berman & McLaughlin, 1977; Cohen, 1991; Cuban, 1993; Huberman, 1973)” (p. 310).

Defenders of this resistance consider it a principled stand:
“Professional principles are conceptions about teaching and professionalism in which teachers view themselves as professionals with specialized expertise, who have discretion to employ repertoires of instructional strategies to meet the individual needs of diverse students, hold high expectations for themselves and students, foster learning communities among students, and participate in self-critical communities of practice” (p.32).
(A newly graduated teacher in a school using Open Court as their literacy curriculum) was concerned about having "fidelity to the program, which means you follow it exactly and don't add in your creativity” (p.37). Another wanted "major ownership in teaching the lessons" (p.40). “"I don't know if this is just a power issue . . . but I don't enjoy being told what to do every day. That is kind of how I felt when I was teaching Open Court. . . . [Prescriptive programs] just don't hold true with my philosophy" (p.43).
Achinstein, B., & Ogawa, R. T. (2006). (In)fidelity: What the resistance of new teachers reveals about professional principles and prescriptive educational policies. Harvard Educational Review, 76(1), 30-63, 130.

Are there other reasons why teachers may not implement programs faithfully?
Educational policymakers have not been in the forefront of evidence-based practice.
“It appears that, in making decisions about school programs, educators "do not often use scientific reasoning and proof to make sense of their world" (Berliner, 2008, p. 309). For example, Levin (2010) reports that administrators were found to rank "personal experience and colleagues as a more powerful influence on their beliefs than either professional development or research" (p. 309). They also tend to rely on war stories and anecdotes rather than drawing on the latest research studies (Hattie, 2009; Labaree, 2008; Stanovich & Stanovich, 2003). Like teachers who prefer research that is personal and experiential (Landrum, Cook, Tankersley, & Fitzgerald, 2007), school administrators may turn to their peers for information and advice (Levin, 2010; Miller, Drill, & Behrstock, 2010). The problem with this approach is that decisions based on individual experiences or anecdotal information can lead to biased conclusions. It is easy to select cases that support one's arguments and ignore those that do not (Davis, 2007). Moreover, popular practices might be flawed even if they are applied extensively” (p. 124).
Bair, M. A., & Enomoto, E. K. (2013). Demystifying research: What's necessary and why administrators need to understand it. National Association of Secondary School Principals.NASSP Bulletin, 97(2), 124-138.

Perhaps the need for fidelity is not made clear, or perhaps there is insufficient attention given in monitoring fidelity in the classroom. The famous table of Joyce and Showers (2002) points to the need for the need for fidelity feature to be emphasised both in initial professional development and in monitoring of classroom practice.

“There is a long-standing acknowledgment in behavioral research with human implementers that procedural infidelity is possible and perhaps likely (e.g., Baer, Wolf, & Risley, 1968; Billingsley, White, & Munson, 1980; LeLaurin & Wolery, 1992). In early intervention research, infidelity may be even more likely, with two “levels” of fidelity measurement: (a) whether researchers implement training procedures correctly and (b) whether indigenous implementers (e.g., early childhood special education teachers, parents) can (and do) implement interventions successfully after training” (p.173-4).
Ledford, J.R., & Wolery, M. (2013). Procedural fidelity: An analysis of measurement and reporting practices. Journal of Early Intervention, 35(2), 173-193.

“Fidelity of Intervention. To address our concerns about variable implementation, research staff observed each tutor at least once a week. During these 15- to 30-minute observations, project staff (Vadasy or Pool) looked for the following actions: starting lessons on time, making error corrections, following lesson formats, managing student behavior, using positive encouragement strategies, and providing a full 30 minutes of instruction. A total percentage of these six behaviors was obtained for each tutor, averaging across behaviors (reported under Results). Both observers at times observed each tutor, and they frequently compared their notes. In conjunction with the observations, tutors were often given brief written or oral feedback (e.g., suggestions for another way to teach a child having difficulty, or praise for a tutor's instructional skills). At other times, project staff modeled a strategy or adjusted a student's placement in the program (e.g., directing the tutor to go back to review previous lessons or lesson components until skills were solidly mastered, or to skip lessons when students had clearly mastered a skill and needed more challenging material).
Finally, students were tested every 10 lessons on mastery of lesson content. Project staff administered these curriculum-based tests with items drawn directly from a recently completed lesson. The mastery tests were a check on the tutor's lesson pacing and the student's acquisition of skills.
Regarding fidelity of implementation, we found that providing more training in lesson components before tutors began working with children, along with increased supervision, resulted in more accurate implementation, relative to levels observed in prior field tests. Whereas in the previous field test only 30% of tutors were observed to implement the majority of the lesson activities consistent with program protocols (Vadasy et al., 1997b), in this field test 71 % of tutors were observed to be high implementors. Moreover, anecdotal evidence (e.g., tutors who increasingly followed program elements and implemented them with greater skill) suggests that the frequent supervision and technical assistance contributed to improved implementation. Obtaining more accurate program implementation was important because a previous finding had indicated a relation between fidelity of implementation and reading outcomes (Vadasy et al., 1997b)”.
Vadasy, P.F., Jenkins, J.R., & Pool, K. (2000). Effects of tutoring in phonological and early reading skills on students at risk for reading disabilities. Journal of Learning Disabilities, 33, 579-590.

How to enhance fidelity:
One approach involves monitoring implementation. This may involve an external consultant, a peer, a school administrator, or self-monitoring via the use of audio/video recording or via checklists
“Multi-tiered system of supports represents one of the most significant advancements in improving the outcomes of students for whom typical instruction is not effective. While many practices need to be in place to make multi-tiered systems of support effective, accurate implementation of evidence-based practices by individuals at all tiers is critical to obtain student outcomes. Effective strategies to achieve program fidelity are available; however, maintaining program fidelity at the individual level remains elusive. Lessons drawn from medicine indicate strategies to maintain program fidelity should address the implementer. Medical practitioners have used self-monitoring checklists to maintain fidelity with striking results. Research evaluating strategies to maintain program fidelity at the individual level represents an important next step in the field of education. Recommendations for a systematic research agenda focused on self-monitoring checklists are presented” (p.14).
Nelson, J.R., Oliver, R.M., Hebert, M.A, & Bohaty, J. (2015). Use of self-monitoring to maintain program fidelity of multi-tiered interventions. Remedial and Special Education, 36(1), 14-19.

 Methods of monitoring include:
“ …  self-reports, rating scales, interviews, checklist, Likert scales, lesson plan reviews, and permanent products (Gable et al., 2001; Gresham et al., 2000; Kovaleski et al., 2006; Lane et al., 2004; Telzrow & Beebe, 2002). The use of indirect assessment measures is less time consuming, more efficient, less likely to be influenced by social desirability, less reactive, and have the potential to be more accurate than other integrity assessment methods (Gresham, 1989; Gresham et al., 2000).”
Kovaleski, J.F., Marco-Fies, C.M., &. Boneshefski, M.J. (no date). Treatment integrity: Ensuring the “I” in RtI. National Center for Learning Disabilities. Retrieved from http://www.rtinetwork.org/getstarted/evaluate/treatment-integrity-ensuring-the-i-in-rti

 Educational Resources, Inc. produced an “app”, entitled “Lesson Fidelity Checklists”, for Apple’s iPad, iPod, and iPhone mobile devices. The app is available from the “App Store” section of Apple’s iTunes, and comes preloaded with a Lesson Fidelity Checklist, which is based on the extensive teacher effectiveness research literature. As such, it is applicable to any program, subject area, or grade level.

An example of a fidelity checklist
(from Benner, G.J., Nelson, J.R., Stage, S.A., & Ralston, N.C. (2011). The influence of fidelity of implementation on the reading outcomes of middle school students experiencing reading difficulties. Remedial and Special Education, 32, 79–88.

A range of other such protocols (about 60) can be found in:
Kovaleski, J.F. (no date). Treatment integrity protocols. National Center for Learning Disabilities. Retrieved from http://www.rtinetwork.org/getstarted/evaluate/treatment-integrity-protocols

For example,

“Young children with and at risk for emotional/behavioral disorders (EBD) present challenges for early childhood teachers. Evidence-based programs designed to address these young children’s behavior problems exist, but there are a number of barriers to implementing these programs in early childhood settings. Advancing the science of treatment integrity measurement can assist researchers and consumers interested in implementing evidence-based programs in early childhood classrooms. To provide guidance for researchers interested in assessing the integrity of implementation efforts, we describe a conceptual model of implementation of evidence-based programs designed to prevent EBD when applied in early childhood settings. Next, we describe steps that can be used to develop treatment integrity measures. Last, we discuss factors to consider when developing treatment integrity measures with specific emphasis on psychometrically strong measures that have maximum utility for implementation research in early childhood classrooms” (p. 181).
Sutherland, K.S., McLeod, B.D., Conroy, M.A., & Cox, J.R. (2013). Measuring implementation of evidence-based programs targeting young children at risk for emotional/behavioral disorders: Conceptual issues and recommendations. Journal of Early Intervention, 35(2), 129-149.

“The intent of all intervention efforts is to demonstrate that changes in dependent variables (e.g., student performance) is the direct result of systematic manipulation of a given independent variable (e.g., a particular intervention or treatment; Lane, Beebe-Prankenberger, Lambros, & Pierson, 2001; Wolf, 1978). In other words, is the intervention, rather than other factors, responsible for the observed changes in student performance? When designing and implementing school-based interventions, careful attention usually is given to the following: (a) designing the intervention, (b) training the appropriate personnel, (c) identifying the appropriate target audience, (d) selecting outcome variables, and (e) monitoring the accuracy with which the outcome data is collected. Yet, often a pivotal intervention component is forgotten, namely treatment integrity (Gresham, 1989, 1998; Yeaton & Sechrest, 1981)” (p.36) … There are a number of ways school personnel can assess treatment integrity. Methods include: (a) direct observation, (b) feedback from consultants, (c) self-monitoring, self-reporting, and behavioral interview techniques, (d) permanent products, and (e) manualized treatments and intervention scripts (Elliott & Busse, 1993; Lane & Beebe-Frankenberger, in press)” (p.37).
“What Factors Influence Treatment Integrity? There are several factors that impact treatment integrity (Gresham, 1989, 1998) including: (a) intervention complexity, (b) implementation time required, (c) materials required, (d) number of personnel involved, (e) perceived and actual effectiveness, and (f) motivation of the treatment agents (teachers). In general, as the intervention increases in terms of complexity and time requirements, the level of treatment integrity decreases. Similarly, the more materials and resources required to implement the intervention, the lower the treatment integrity, particularly when the intervention requires materials that are not typically found in the classroom setting. Furthermore, interventions that require assistance from more than one person (e.g., teacher, paraprofessional, or support staff) are less likely to be implemented with integrity, relative to those interventions requiring support from only one individual. Perceptions of potential effectiveness also influence treatment integrity. Specifically, if the person implementing the intervention (e.g., teacher) views the intervention to be potentially effective, or socially valid, he or she may be more likely to implement the intervention as originally designed than if he or she perceives the intervention to be ineffective (see Lane et al., 2001 for a discussion of the relationship between social validity and treatment integrity). Finally, teacher motivation may also influence the extent to which treatments are implemented with integrity. If the teacher is attempting to find ways of better serving a child in the general education classroom, treatment integrity ratings will probably be higher than if the teacher's goal is to have the child assessed and moved to another setting for support services (Witt & Martens, 1988; Ysseldyke, Christenson, Pianta, & Algozzine, 1983)” (p.41).
Lane, K.L., Bocian, K.M., MacMillan, D.L., & Gresham, F.M. (2004). Treatment integrity: An essential--but often forgotten--component of school-based interventions. Preventing School Failure, 48(3), 36-43

“This study examined whether direct, interval-by-interval measures of treatment integrity would make it possible to distinguish whether equivocal intervention results could be attributed to the intervention itself, or to poor implementation. Josh, an eight-year-old 3rd grader, performed at or slightly above his peers' academically, yet engaged in problem behaviors (yelling, throwing objects, slamming his desk into a peer's desk) on a daily basis. A functional behavioral assessment (FBA) identified these behaviors were maintained by gaining attention (positive reinforcement) and escaping from certain assignments (negative reinforcement). A function based intervention was then developed, tested, and implemented during ongoing activities in the classroom. On-task behavior occurred throughout more than 91% of the intervals when the intervention was implemented correctly, compared to only 9% when it was implemented incorrectly. Positive treatment acceptability ratings were obtained from both Josh and his teacher, even though she continued to implement inconsistently throughout the study. Implications for both research and practice are presented (p.105). … The data reported here are consistent with reviews that have emphasized the importance of reporting the degree to which interventions are implemented as intended (Gresham, Gansle, & Noell, 1993; Gresham, Gansle, Noell, Cohen, &Rosenblum, 1993; Perepletchikova & Kazdin, 2005; Peterson et al., 1982; Moncher & Prinz, 1991). Treatment integrity data make it possible to attribute observed effects to a particular intervention, rather than to extraneous variables. Without these data, one cannot distinguish weak interventions that are implemented perfectly from strong interventions that are implemented poorly. Unfortunately, most intervention researchers have a poor record of assessing and reporting implementation data. … Past research has suggested that levels of treatment integrity can be greatly improved by providing teachers with performance feedback on the accuracy of treatment implementation (Noell, Witt, Gilbertson, Ranier, &Freeland, 1997; Noell, Witt, LaFleur, Mortenson, Ranier, & LeVeIIe, 2000). Moreover, programmed consequences for teachers, including both performance feedback and negative reinforcement (escape from a meeting with a behavior analyst) have produced higher levels of treatment integrity than a single programmed consequence or no programmed consequence conditions (DiGennaro, Martens, & Kleinmann, 2006)”. (p.114)
Wood, B.K., Umbreit, J., Liaupsin, C.J, & Gresham, F.M (2007). A treatment integrity analysis of function-based intervention. Education & Treatment of Children 30(4), 105-120.

Beyond monitoring: Actively coaching teachers
In schools, there is increasing acceptance that in-house coaching of teachers to obtain and maintain fidelity is very helpful – if not always initially appreciated by some teachers. The Joyce and Showers table shown earlier highlights the large difference coaching can make to the benefits of professional development.
“Teacher coaching has emerged as a promising alternative to traditional models of professional development. We review the empirical literature on teacher coaching and conduct meta-analyses to estimate the mean effect of coaching programs on teachers’ instructional practice and students’ academic achievement. Combining results across 60 studies that employ causal research designs, we find pooled effect sizes of 0.49 standard deviations (SD) on instruction and 0.18 SD on achievement. Much of this evidence comes from literacy coaching programs for prekindergarten and elementary school teachers.” (p. 547)
Kraft, M., Blazar, D., & Hogan, D. (2018, August). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research, 88(4), 547-588.

 “Evidence-based programs (EBPs) focused on prevention are increasingly used in schools to promote students’ academic, social, emotional, and behavioral functioning. Although the efficacy of these EBPs has been established in the literature, issues with adoption and implementation persist (Domitrovich et al., 2008; Domitrovich & Greenberg, 2000; Elias, Zins, Graczyk, & Weissberg, 2003; Spoth et al., 2013). Several questions remain regarding the best way to optimize implementation of EBPs. Implementation supports, often in the form of coaching, have been identified as a promising approach for increasing the implementation fidelity of EBPs, and thus promote stronger program effects (Bradshaw, Pas, Goldweber, Rosenberg, & Leaf, 2012; Kretlow & Bartholomew, 2010; Pas, Bradshaw, & Cash, 2014). Coaching is, however, a complex and dynamic process, which includes coach engagement in multiple activities as well as a social process between the coach and implementer (i.e., working relationship) that may prompt teacher change in the behavior or skill targeted by coaching. A coach and teacher have a social relationship through which social persuasion can occur (Taylor, 2007). Through this process, the coach empowers the teacher by indicating their own confidence in the teacher’s ability to use a strategy as well as the value of the performance for achieving the desired outcome. The coach also provides social and emotional support to the teacher during the teacher’s use of the new strategies (Taylor, 2007). Little is currently known, however, about how these discrete activities or the interpersonal nature of the working relationship relate to implementation fidelity, which can be measured in a variety of ways. Some research has demonstrated the association between specific coaching activities and teacher changes in implementation (Coles, Owens, Serrano, Slavec, & Evans, 2015; Reinke, Lewis-Palmer, & Martin, 2007; Sanetti, Collier-Meek, Long, Kim, & Kratochwill, 2014; Stormont & Reinke, 2014). Coaching includes many different activities such as needs assessment, modeling, technical assistance, and check-ins; coaches may vary their use of these activities in relation to a variety of implementer factors (e.g., teacher beliefs and perceptions regarding efficacy, burnout, and organizational factors; Pas et al., 2015). In turn, coaching activities may also influence implementation, these perceptions, and the teacher’s perceived working relationship with the coach. The teachers’ perception of their working relationship with a coach reflects collaboration, feelings of being supported by the coach, viewing the coaching process as competent, and overall satisfaction with the coaching (Johnson, Pas, & Bradshaw, 2016).” (p. 406)
Johnson, S.R., Pas, E.T., Bradshaw, C.P., & Ialongo, N.S. (2018). Promoting teachers’ implementation of classroom-based prevention programming through coaching: The mediating role of the coach–teacher relationship. Adm Policy Ment Health, 45(3), 404–416.

“In this study we measured the impact of a professional development model that included directive coaching on the instructional practices of Western Australian primary school teachers taking up explicit instruction. We developed and validated protocols that enabled us to measure teachers’ fidelity to the salient elements of explicit instruction and interviewed participants about the impact of the coaching program on student learning, their feelings of self efficacy and attitudes to being coached. Numerical scores to indicate teachers’ demonstration of explicit instruction lesson design and delivery components changed positively over the five observed lessons, and directive coaching had a positive impact on teachers’ competence and confidence. The elements of the coaching process that the teachers found valuable were the coach’s positive tone, the detailed written feedback, and the specificity, directness and limited number of the suggestions. Implications for schools with reform-based agendas wanting to change teachers’ instructional practices through instructional coaching are discussed.” (p. 110)
Hammond, L., & Moore, W.M. (2018). Teachers taking up explicit instruction: The impact of a professional development and directive instructional coaching model. Australian Journal of Teacher Education, 43(7), 110-133. Retrieved from http://ro.ecu.edu.au/cgi/viewcontent.cgi?article=3969&context=ajte

 “Teacher educators seek to create environments that will foster change. From a sociocultural prospective, instructional change requires not only awareness of content and practices, but more importantly, an understanding of the contexts involved in the construction and appropriation of knowledge. For teachers, as for their students, scaffolding in the context of use is necessary for effective learning to take place. Teachers benefit when they are supported in the process of changing their practices. Unfortunately, most professional development activities are separated from the classroom, and thus from the opportunity for teachers to put what they are learning into immediate use. The lack of such an activity setting is a problem that Tharpe and Gallimore called “the choke-point of change” (1988, p. 190).” (p. 27)
Collet, V.S. (2012). The gradual increase of responsibility model: Coaching for teacher change. Literacy Research and Instruction, 51(1), 27-47.

“This article presents the Classroom Strategies Coaching (CSC) Model, a data-driven coaching approach that uses teacher formative assessment data to drive improvements in universal practices. The classroom strategies assessment system (CSAS), a formative assessment of evidence-based instructional and behavioral management practices was used to facilitate the coaching process. Results from 32 elementary school teachers who received brief coaching after participating as waitlist controls in a randomized controlled trial are presented. Teachers’ practices remained stable across baseline periods. Following coaching, teachers displayed improvements toward their behavioral management goals (e.g., ds = .50–.83). Results also showed meaningful reductions in the overall need for change in instruction (d = .88) and in behavior management practices (d = .68) at postintervention. Findings illustrate the benefits of integrating teacher formative assessment in coaching to improve teaching practices.” (p.81)
Dudek, C. M., Reddy, L. A., Lekwa, A., Hua, A. N., & Fabiano, G. A. (2019). Improving universal classroom practices through teacher formative assessment and coaching. Assessment for Effective Intervention, 44(2), 81–94.

“By pooling results from across 60 causal studies of teacher coaching programs, we find large positive effects on instruction and smaller positive effects on achievement. Effects on instruction and achievement compare favorably when contrasted with the larger body of literature on teacher PD (Yoon et al., 2007), as well as most other school-based interventions (Fryer, 2017). The growing literature on teacher coaching provides a much needed evidentiary base for future directions in teacher development policy, practice, and research. Ultimately, improving the teacher workforce will require continued innovation in in-service PD programs. Teacher coaching models can provide a flexible blueprint for these efforts, but many questions remain about whether coaching is best implemented as a smaller scale targeted program tailored to local contexts or if it can be taken to scale in a high-quality and cost-effective way.” (p.577)
Kraft, M.A., Blazar, D., & Hogan, D. (2018). The effect of teacher coaching on instruction and achievement: A meta-analysis of the causal evidence. Review of Educational Research, 88(4), 547–588.

“Gresham (1989) suggested the following reasons as to why treatments are not carried out as intended: (a) the complexity of the plan, (b) the number of treatment agents, (c) the time required to implement, (d) the resources required, and (e) perceived effectiveness or motivation of treatment agent. Many PF studies, including Duhon, Mesmer, Gregerson, and Witt (2009), Noell, Witt, Gil bertson, Ranier, and Freeland (1997), Witt, Noell, LaFleur, and Mortenson (1997), and Noell, Duhon, Gatti, and Connell (2002), have documented notable negative slopes in integrity during the baseline phase that would have gone unnoticed had follow-up not occurred. To combat threats to integrity, Gresham suggested that researchers and practitioners use direct observation coupled with the provision of behavior-specific feedback to the consultee. He also suggested self-monitoring strategies. Performance feedback is a tool to manipulate levels of TI so practitioners can conclusively state whether students received the intended intervention or support. (p. 160). … Scheeler, Ruhl, and McAfee (2004) conducted a literature review of ten studies that described using PF for in-service and preservice teachers. They concluded immediacy of PF was the only clear moderator of effect--PF should occur as soon after observation as possible. This observation falls in line with prior experimentation that has shown a negative correlation between delay of a contingency and its effect on altering behavior (Lattal, 1993; Renner, 1964). … Included studies occurred in preschools, elementary schools, and middle/high schools, as well as general education and special education classrooms, with a strong bias toward the elementary level and general education. Results demonstrated that PF was effective in preschool through high school and that grade level alone did not significantly moderate the effectiveness of the PF. This is a promising finding, demonstrating that teachers in any setting are responsive to PF” (p.166).
Solomon, B.G, Klein, S.A, & Politylo, B.C. (2012). The effect of performance feedback on teachers' treatment integrity: A meta-analysis of the single-case literature. School Psychology Review, 41(2), 160-176).

How much fidelity is required?
More research is needed; however, it may not be possible to summarise the amount as a simple percentage, because some components may be more influential on student success than others, and this may even change from one curriculum area to another.
“As reported by Schulte et al. (2009), despite understanding the importance of implementing interventions with fidelity, in practice there is no empirical guidance regarding the level of fidelity of implementation that is needed to realize a meaningful gain in student performance. For example, it may not be clear whether a particular intervention is effective only if implemented with 100% fidelity or if a lesser level would be adequately effective (e.g., 90%, 80%). Therefore, educators do not know to what extent deviations from a prescribed intervention plan can occur and still obtain the expected results (Gresham et al., 2000). There is currently no standardized generic treatment integrity instrument with which to collect fidelity of implementation data. If such an instrument existed, practitioners could establish cut-points that define the extent of intervention fidelity required in an RtI model (Schulte et al., 2009).”
“The teachers in this study who adhered more closely to the PD materials had a greater impact on student achievement than those who did not. The PD focused on evidence-based practices for vocabulary and comprehension instruction, as well as general effective instructional practices. Fidelity in our study was not exceptionally high, with an average of 4.7 on a scale of 1 to 10. The role of fidelity in the interpretation of findings and the importance of the teacher in maintaining fidelity to the treatment are critical in research (Hulleman & Cordray, 2009)” (p.254).
Hairrell, A., Rupley, W.H., Edmonds, M., Larsen, R., Simmons, D., Willson, V., Byrns, G., & Vaughn, S. (2011). Examining the impact of teacher quality on fourth-grade students' comprehension and content-area achievement. Reading & Writing Quarterly: Overcoming Learning Difficulties, 27(3), 239-260.
Hulleman, C., & Cordray, D. S. (2009). Moving from the lab to the field: The role of fidelity and achieved relative intervention strength. Journal of Research on Educational Effectiveness, 2(1), 88–110.

When curricula lack instructional specificity
Of course, all this research on fidelity presumes that program developers will provide a sufficient level of specificity to enable fidelity to occur. There are many published programs written by hopeful but instructionally unskilled individuals whose programs have huge specificity gaps that require a high level of teacher decision-making. In these programs it is unsurprising that results are inconsistent – effectiveness varying dramatically according to the appropriateness of the decisions teachers are required to make during implementation.
“ … although existing research suggests that the average effect of CSR [comprehensive school reform] programs on student achievement is small, variability in effectiveness from CSR program to CSR program is substantial. (p. 300)… If innovative programs produce only very few differences in instruction (in comparison to normative practice), we should not expect them to produce large effects on student achievement. For these reasons, we urge researchers interested in studying innovative instructional programs to venture inside the black box not only by explicitly measuring rates of faithful program implementation but also by looking closely at the nature of instruction being implemented. Both factors are needed if we are to explain why some programs have more effects on student achievement than others.” (p. 332) … we conclude that well-defined and well-specified instructional improvement programs that are strongly supported by on-site facilitators and local leaders who demand fidelity to program designs can produce large changes in teachers' instructional practices.” (p.298)
Correnti, R., & Rowan, B. (2007). Opening up the black box: Literacy instruction in schools participating in three comprehensive school reform programs. American Educational Research Journal 44(2) 298–338.

 Other references:
Blakely, M. R. (2001). A survey of levels of supervisory support and maintenance of effects reported by educators involved in Direct Instruction implementations. Journal of Direct Instruction, 1(2), 73-83.
Coulter, G. & Grossen, B. (1997). The effectiveness of in-class instructive feedback versus after-class instructive feedback for teachers learning Direct Instruction teaching behaviors. Effective School Practices, 16(4), 21-35.
Gawande, A. (2010). The checklist manifesto: How to get things right. London: Profile Books. Summary retrieved from https://www.google.com.au/#q=the+checklist+manifesto+pdf
Gersten, R. M., & Carnine, D. W. (1982). Measuring implementation of a structured educational model in an urban school district: An observational approach. Education Evaluation and Policy Analysis, 4(1), 67-79.
Harn, B., Parisi, D., & Stoolmiller, M. (2013). Balancing fidelity with flexibility and fit: What do we really know about fidelity of implementation in schools? Exceptional Children, 79, 181–193.
Hummel, J., Wiley, L., Huitt, W., Roesch, M. & Richardson, J. (2002). Implementing Corrective Reading: Coaching issues. Georgia Educational Research Association.
Pyle, N. (2012). The influence of fidelity of implementation on the reading outcomes of middle school students experiencing reading difficulties. Evidence-Based Communication Assessment and Intervention, 6(2), 108-112. doi: 10.1080/17489539.2012.735812
Stockard, J. (2011). Direct Instruction and first grade reading achievement: The role of technical support and time of implementation. Journal of Direct Instruction, 11(1), 31-50.