By:
Meltem Alemdar, Ph.D., Associate Director, Principal Research Scientist, Center for Education Integrating Science, Mathematics, and Computing, Georgia Institute of Technology
Christopher Cappelli, MPH, Senior Research Associate, Center for Education Integrating Science, Mathematics and Computing, Georgia Institute of Technology

The role of evaluation in National Science Foundation (NSF) projects has become critically important. Evaluation produces information that can be utilized to improve the project. Information on how different aspects of the project are working and the extent to which the goals and objectives are being met is essential to a continuous improvement process. Additionally, evaluation documents what has been achieved by the project.
Evaluators should always work closely with principal investigators (PIs) during the proposal stage to ensure that the evaluation aligns well with project goals; however, the degree to which this happens before and after funding is received depends on the PI’s approach to collaboration and perspective on the value of evaluation. Some PIs perceive evaluation as a formality required for the proposal to get funded, and perhaps for accountability purposes. Others see it as the most important part of the proposal. Whether the PI perceives evaluation as a critical component of a project or not has profound ramifications for the quality of the evaluation, the clarity of the stated evaluation focus, the selection of methodology and design, as well as the utilization of evaluation results by the PI. The need to deal with this array of critical considerations requires a robust evaluation plan. The choice of evaluation frameworks thus becomes very critical.
Evaluation frameworks provide guidance for program developers and evaluators to ensure that the evaluation’s overall design reflects and incorporates the originating motivations, principles, and context of the program examined. While the term “evaluation framework” is very common in the discipline of program evaluation, it has had various interpretations. The recent article by Arbour (2020), “Frameworks for Program Evaluation: Considerations on Research, Practice, and Institutions,” analyzes various approaches by different evaluation associations and organizations.
This article specifically highlights program evaluation, rather than the more general domain of evaluation. The paper provides examples of how frameworks are defined within the field and for organizations and associations. For example, the Organization for Economic Co-operation and Development (OECD) (2014) created its Framework for Regulatory Policy Evaluation, which is an extensive guide to assist “countries in systematically evaluating the design and implementation of regulatory policy” (p. 13), whereas the United Nations Office for Disaster Risk Reduction (2015) refers to its Monitoring and Evaluation Framework as a way to “provide a consistent approach to the monitoring and evaluation” of its programs (p. 2). The paper also describes the well-known Chen (2001) and Cooksy (1999) frameworks, which mostly focus on program theories and logic models. The paper also highlights the context dependent dimensions of choosing evaluation frameworks, such as the practice of program evaluators, as well as the type of intervention and program evaluation functions. Arbour (2020) concludes by emphasizing that “a framework has an impact because someone decides to adopt, adapt, or develop that framework in a given evaluation context” (p.13). This leads in many cases to locally developed logic models, evaluation plans, evaluation policies, and many other products associated with the term “evaluation frameworks.” This important observation is also borne out in our experience as evaluators, where we have found that different fields of study or practice govern the choice and implementation of evaluation frameworks. For example, participatory evaluation (King, 2005) is most commonly used in community-based interventions. The developmental evaluation framework (Patton, 2010) tends to be used for innovation, radical program re-design, and addressing complex issues and crises.
In Alemdar, Cappelli, Criswell, and Rushton (2018), we attempt to provide a template for evaluating teacher leadership training programs funded through the NSF Robert Noyce Teacher Scholarship (Noyce) program. The Noyce teacher leadership training programs are particularly challenging to evaluate for multiple reasons. First, the program-specific characteristics might evolve over the years, which can make it difficult to develop an effective evaluation of the program. Noyce programs are also hard to evaluate due to the small number of individuals admitted into cohorts each year. Most evaluations focus on yearly data for primarily formative purposes. The summative data usually focus on program level data rather than teacher level outcomes. To provide useful evaluation data and analysis to key stakeholders, teacher leadership professional development programs need to be evaluated longitudinally, utilizing proven methodologies and frameworks that are able to account for the small sample sizes common in these programs. It takes years for a teacher to transform into a leader who moves her colleagues toward positive change. Hence, it becomes important to capture the longitudinal teacher development.
Some evaluation frameworks require substantial time commitments from the project PIs, management, and others involved in the project during every step of the evaluation. Considering the limited knowledge of evaluation methodologies that are useful for evaluating teacher leadership programs with small sample sizes, as well as the relationships that we have built with the PIs, we chose to use multiple complementary evaluation frameworks to determine the overall program impact on the development of teacher leadership skills.
One approach was a utilization-focused evaluation, described as, “evaluation done for and with specific intended primary users for specific, intended uses” (Patton, 2008, p. 37). An essential component of utilization-focused evaluation is identifying the program stakeholders, or the primary intended users of the evaluation, and understanding their perspectives on the intended use of the evaluation. Patton (2008) describes the importance of the “personal factor” when identifying the intended users of the evaluation, defined as “the presence of an identifiable individual or group of people who personally care about the evaluation and the findings it generates” (p. 44). This group of people has a personal interest in the success of the program and enhance their own ability as consumers or decision-makers to predict and guide the outcomes of the program. Through this framework, we built close relationships with both program leadership and participants, developing a high level of trust that proved to be a cornerstone for the success of this evaluation. By understanding the “personal factor” and its importance in utilization, we involved key stakeholders so as to better understand their perspectives regarding the intended uses of this evaluation. This approach ensured that throughout the program period, evaluation data were presented in a way that placed utilization at the forefront.
Furthermore, teacher leadership training programs often incorporate theories for the development of leadership. In our early conversations, the PIs discussed multiple teacher leadership theories that guided their development of the program, such as Dempsey (1992) and Snell and Swanson (2000). Since these theories formed a theoretical foundation for the program, the theory-driven evaluation framework by Chen (1990) was also adopted. This framework is designed to use a validated theory to guide the evaluation.
While utilization-focused evaluation provides timely, useful information to the program leadership for decision making, theory-driven evaluation can be “…analytically and empirically powerful and lead to better evaluation questions, better evaluation answers and better programs” (Rogers, 2000, p. 209). Moreover, with theory-driven evaluation guiding the evaluation process, it is thought that the evaluation is able to not only assess whether or not a program is working, but also illuminate why or how the program is having an impact on its participants (Chen, 2012). This is particularly important in the context of teacher leadership programs, so that the programs can build a theory-driven model, which can be easily adapted by others. This should be the goal of any NSF-related project evaluations – to effectively assess the merit of the program. Given the complementary data provided through the use of these theoretical frameworks, the results of the evaluation were used by the PIs extensively to improve the program and achieve the program goals. For example, in the early stages of the program, the formative data showed that teachers were struggling in reflecting their teaching practice, which is an important domain for developing teacher leadership. The program addressed this challenge by including more professional development and discussions around this topic.
In our paper, we also showed how the program theory guided the development of interview and focus group protocols to longitudinally track the development of leadership through Snell and Swanson’s four dimensions of teacher leadership: Empowerment, Expertise, Reflection and Collaboration. Documenting the development of teacher leadership over time is particularly difficult with sample sizes and limited evaluation resources. Using multiple frameworks substantially assisted the program in documenting its impact regarding change over time in the four dimensions of teacher leadership, and therefore, in the development of teacher leaders. Because of the collaborative nature of these evaluation frameworks, a conceptual framework was also constructed in collaboration with the PIs. From the perspective of a utilization-focused evaluation, involving key stakeholders in the development of a conceptual framework ensures a common understanding of the relationship between program components and the desired outcomes, resulting in agreement for the intended use of evaluation results. Similarly, from a theory-driven perspective, the development of a conceptual framework for the program systematically organizes stakeholders’ perceptions of both the process that is expected to happen to produce change and the activities needed to create the desired change as a result of participation in the program (Chen, 2012).
Implications
Choosing and implementing an evaluation framework(s) to better determine the merit of programs will vary by a program’s specific context. Based on our experiences, we developed several recommendations that evaluators and Noyce programs should consider when developing an evaluation plan:
Conclusion
Given the continuously evolving nature of the teacher leadership programs, the often small sample size, and the historic lack of literature offering a clear concept of teacher leadership, we, as evaluators, found that the concurrent use of both Utilization-Focused and Theory-Driven evaluation frameworks provided a firm foundation on which the evaluation could develop and evolve in tandem with the program. Further, the use of evaluation frameworks significantly improves documentation of the impact of the programs, which, in turn, facilitates replication of the program in new and different settings.