Impact Evaluation Webinar Series: Questions And Answers

Webinar 1: Introduction to Impact Evaluation

1. Can you consider Theory of Change as a method to articulate tacit values?

Yes, developing a Theory of Change often involves making explicit the tacit values that underpin the intervention - the criteria and standards for judging its success. These values can relate to different desirable and undesirable outcomes and impacts, processes, and distribution of costs and benefits. For example, would the project be considered a success if the average went up, or if the gap between best and worst off was decreased, or if anyone was disadvantaged by it?  These values are not always made explicit in documentation about the intervention, so the process of developing a theory of change should also draw on discussions with relevant stakeholders, along with observations (or reported observations), and relevant research. It usually requires a deliberate process to help people elaborate on what a successful program would look like.

Sometimes people put evaluation criteria into the actual logic model (for example, “60 percent of students will have increased test scores of at least two grades”) but I don’t recommend doing this. I think it’s better to have the outcome defined more broadly in the diagram (for example “Students’ learning increased”) and separately set out how this will be operationalized, how evidence will be collected, and what standards of performance will be used.  That way you can separate the discussions about the logic of how the program is understood to work from the discussions about the evaluation of it – and give these issues the space and care they deserve.

I have found that using Sue Funnell’s program theory matrix is often useful for doing this. This takes the different intermediate and final outcomes that have been identified in the theory of change and for each one of them answers the following questions:

1. What would success look like (in broad terms – not immediately stopping to identify how this would be measured)?

2. What are the factors that influence the achievement of each outcome? Which of these can be influenced by the project? Which factors are outside the direct influence of the project?

3. What is the program currently doing to bring about this outcome (direct activities or activities to address the identified factors)?

4. What performance information should we collect (about outcomes, factors and activities related to this outcome)?

5. How can we gather this information?

6. What should this information be compared to? (e.g., baseline, targets, comparative regions?)

Resources:

  • Guidance Note 1: Introduction to Impact Evaluation, Section 7: "Developing a Theory or Model of How the Intervention is Supposed to Work," pp. 6-8.
  • Funnell, S. C. (2000) “Developing and Using a Program Theory Matrix for Program Evaluation and Performance Monitoring.” In P. J. Rogers, A. J. Petrosino, T. Hacsi, and T. A. Huebner (eds.), Program Theory in Evaluation: Challenges and Opportunities. New Directions for Evaluation, no. 87. San Francisco: Jossey-Bass. Sets out the program theory matrix with a worked example.
  • Funnell, S.C.  and Rogers, P. J. (2011) Purposeful Program Theory: Effective Use of Theories of Change and Logic Models. San Francisco: Jossey-Bass/Wiley. Sets out a revised version of the program theory matrix and provides more detail about how to develop it. First two chapters available free here.
  • Douthwaite, B. And Schulz, S. (2001) "Spanning the Attribution Gap: The Use of Program Theory to Link Project Outcomes to Ultimate Goals in INRM and IPM."  Paper presented at the INRM Workshop, 2001, Cali, Colombia, 28-31 August. An example of an evaluation that used the program theory matrix.

 

2. Clarification of values should not just be considered when planning an evaluation, but also as the intervention itself is being planned, correct? Is there something particular about impact evaluation that makes it essential that the pieces you mentioned earlier in your presentation are considered early, while the intervention is being planned?

The values should have been clarified when the intervention is being planned but this also needs to be revisited when planning and conducting an impact evaluation. Sometimes the values are adequately reflected in the documentation, but very often they are not. Sometimes there are implicit values that only become evident when they’ve been violated. So there might have been, for example, a value about the targeting of the program or the level of improvement that would make the program worthwhile, but that hasn’t been documented.

I have found it useful to revisit specific activities and fill in the gaps in the documentations, but also to check that the program and the people haven’t changed. Sometimes the people who set up a program, the people who carry it out, and the people who evaluate it all have very different sets of values.  So it’s important to clarify the values when planning the intervention and evaluation.

In an impact evaluation it is particularly important to clarify the values at the beginning of the intervention if possible so that appropriate data can be gathered before the intervention starts (baseline data about impact variables) and during the intervention (data about the processes used)

Resources:

 

3. Impact evaluation has in the past been concerned with long-term changes. Has this focused changed? If you focus on boundary partners (as in Outcome Mapping), does this mean you end up ignoring more distant social impacts?

Impacts have usually been defined as long-term results. However, when you look at many of the examples of impact evaluation, they are looking at short term outcomes. The evaluations of conditional cash transfer projects, for example, have evaluated these in terms of whether it has increased school attendance and retention, which is not a long-term impact but an intermediate outcome. Attending school is a way of achieving the long-term impact of increasing educational skills (such as literacy and numeracy) how to read but because of the timeframe, this wasn’t being looked at in terms of long-term impacts.

This comes back to the point of how we balance wanting to have information early to influence policy, but also recognizing that a lot of impacts are long term. The theory of change is helpful because it can help identify some indicators that are not guaranteed to get you impacts, but at least show that you’re going down the right path.

Outcome Mapping was developed in response to concerns about the inappropriateness of focusing on impacts when evaluating complex social change interventions where it is not possible to untangle the many different contributors to final impacts. Terry Smutylo wrote “The Output, Outcome, Downstream Impact Blues” arguing against evaluating in terms of impacts and in a presentation available on YouTube explains this point of view (lyrics available here and karaoke version available here).

I don’t think we need to ignore longer-term impacts in our evaluation, just take more account of the multiple contributors to them.  Some of the examples of using Outcome Mapping at this year’s Outcome Mapping Lab in Beirut in January clearly were including impacts in their evaluation (see, for example Fletcher Tembo and Kevin Kelly’s presentation on governance programs and other presentations here).

 

4. How do you deal with multiple, different Theories of Change?

The issue of multiple theories of change is one we need to pay more attention to. When people refer to THE Theory of Change underpinning an intervention, this can give an unhelpful message that there is one theory of change and that one diagram will be sufficient for all purposes.

We can have different Theories of Change that show the intervention at different scales – for example, funders or policymakers might want a higher level overview of a large intervention, whereas implementors and local managers might want a more detailed version.  It is often helpful to have different versions of the Theory of Change at different levels of detail. For example, Ray Pawson, in a paper on “Invisible Mechanisms” has focused particularly on the initial stage where participants choose to engage in a program (or not).

We can also have different Theories of Change for different sites or for different people, where the overall Theory of Change for the program has been operationalized and in some cases adapted to suit particular circumstances.

It is quite easy to have different theories of change where they can be readily integrated. It is a different issue when we have competing theories of change. Different people might have different plausible ideas about how interventions work. For example, does Meals on Wheels work by improving the nutrition of elderly people – or by reducing the anxiety of their absent children?

One section I really like in Ray Pawson and Nick Tilley’s book ‘Realistic Evaluation’ is where they list eight to nine theories for a single intervention (installing closed circuit TV to reduce auto theft in a parking garage) and lay out what you would expect to see in each case, and hence how an evaluation could be designed to not only show whether it worked but how – massively increasing the utility of the evaluation for replication. We can learn a lot from that. A lot of impact evaluations just lay out a single theory of change that doesn’t think hard about how change would come about. Where there are potentially competing Theories of Change we need to think about what the evidence says about that – and how we could design the evaluation to provide more evidence.

Resources:

  • Pawson, R. and Tilley, N. (1997) Realistic Evaluation, London: Sage Publications. Classic book with the potential to permanently change the way you think about evaluation.
  • Pawson, R. and Tilley, N. (2004) Realistic Evaluation.  Available here. Provides an overview of this approach to evaluation. 
  • Tilley, N. (2000) Realistic Evaluation.  Keynote presentation at the founding conference of the Danish Evaluation Society.
  • Pawson, R. (2008) "Invisible Mechanisms," Evaluation Journal of Australasia, 8 (2): 3-13). Available here. Focuses particularly on the causal mechanisms involved in participants engaging or not engaging in interventions.

 

5. Please say more about how to approach varied implementation across multiple sites.

We need to distinguish between different types of variations in implementation, each of which has different implications for impact evaluation. 

One type is where the quality of implementation varies across sites.  In some cases the quality of implementation is so poor that if the intended impacts are not achieved, it might well be because of this. This is important to recognize.  In other cases, there might be a dose-response effect, where sites with higher quality or more intensive implementation have better results than sites with lower quality or lower intensity implementation. In these cases, it will be important to gather data about the quality of implementation so that this can be used in the analysis and reporting.  For example, imagine you have an intervention that is very effective when implemented correctly, but this has not been done at most sites. The average effect might be quite low, but this would not provide very useful information for decision makers. They need to understand that it can be effective, but also the likelihood of being able to improve implementation quality.

Another type is where there are apparent variations but they are still essentially the same underneath – there has been local adaptation to suit the context, but not significant changes to how it works. A theory of change can help you to identify projects that are in essence doing the same thing even though there might be superficial differences in their activities. This is something that I wrestled with in an evaluation of a family and community strengthening program which had 635 different projects at different sites doing very different things. Superficially there were many differences, but when you looked behind the details of the activities there was a common logic to them. They had been formed using a commonly theory of change at the planning stage, and this actually did inform how they planned and implemented their projects. For example, one of the big challenges was engaging people in the projects. The projects used similar strategies to develop trust and engagement.

The third type is where the projects are actually working in very different ways. The evaluation of the implementation of the Paris Declaration demonstrates how a common program theory (or theory of change) can provide a useful conceptual framework for collecting and analysing data across different sites and then doing a synthesis.

Resources:

  • Wood, B., Betts, J., Etta, F., Gayfer, J., Kabell, D., Ngwira, N., Sagasti, F., and Samaranayake, M. "The Evaluation of the Paris Declaration, Final Report," Copenhagen, May 2011. Annex 5: Technical Annex to the Synthesis Report.  

 

6. In order to differentiate between "attribution" and "contribution" to impact, one has to be very clear about our own activities, its reaching points and outcomes, as well as the interventions other actors are doing in the same direction, right?

Yes, I think it’s very easy to attribute everything to your actions if you’re ignorant of what other people are doing. Knowing something about the way the world works, you realize that’s not usually a very plausible explanation. So that means that you have to really understand not just what your activities and outcomes are, but how how change comes about. Sometimes there is work that other organizations have done to create the conditions under which you can be effective. That doesn’t take anything away from the work you’ve done to be effective, but it’s necessary for you to understand so you can learn from this evaluation and then go into a different situation and implement something successfully. 

 

7. From my personal point of view, impact evaluation can have two directions depending on the use. The first one is external; it means reporting, accountability, money allocation, etc. The second one would be internal - using those findings as a tool for decision-making processes.

Impact evaluations can have quite different intended uses, which have implications for how they should be planned and conducted. In the Guidance Note we discussed six important intended uses: to decide whether to fund an intervention (ex-ante impact evaluation); to decide whether or not to continue or expand an intervention; to learn how to replicate or scale up a pilot; to learn how to successfully adapt a successful intervention to suit another context; to reassure funders, including donors and taxpayers (upward accountability), that money is being wisely invested; and to inform intended beneficiaries and communities (downward accountability) about whether or not, and in what ways, a program is benefiting the community.

You need to think very carefully about the types of accountability we have and what people are being held accountable for. We don’t want to create disincentives for acknowledging that some things have not worked. This can create tensions between learning and some forms of accountability.

This is a great challenge for development. We need to manage risk carefully, because the consequences of failing can be serious, but we must be able to acknowledge and learn from failure. Weick and Sutcliffe’s research into high reliability organizations suggests that we should hold people accountable for appropriately managing uncertainty and the unexpected, not just implementing what has been planned.

Resources:

  • Guijt, I. (2010) "Exploding the Myth of Incompatibility between Accountability and Learning." In Ubels, Acquaye-Baddoo and Fowler (eds.) Capacity Development Practice. London: Earthscan. Available for free download here
  • Weick, K. and Sutcliffe, K. (2007) Managing the Unexpected: Resilient Performance in an Age of Complexity. San Francisco: Jossey-Bass. Summary available via Weick, K. (2007) "5 habits of Highly Reliable Organizations," Fast Company Magazine.

 

8. Could you talk a little more about cost-benefit analysis and its relation to impact evaluation? How can we use cost-benefit analysis as a part of impact evaluation?

Cost benefit analysis is built on the premise that the observed positive outcomes and benefits have been caused by the program, so you need some level of impact evaluation to determine that the outcomes have, in fact, been caused by the intervention. Without that, you can’t really do a cost benefit analysis. When you look at some cost-benefit analyses, their impact evaluation component is quite weak, so it’s an area in which we need to do more work. This doesn’t mean that cost-benefit evaluations always need to include a particular type of impact evaluation, but they need to do something more rigorous than simply reporting that impacts occurred, assuming they were caused by the intervention, and then dividing the value of the impacts by the cost of the intervention. 

 

9. How does the unit of analysis affect the design of an impact evaluation? For example, if you are looking at country-level changes and have a small sample size of countries, which approaches are feasible?

In this case, you have such small numbers that random assignment or a created comparison group isn’t going to give you statistical power, so that’s not really an option. That’s where it is important to use some of the rigorous non-experimental methods that have been developed in fields like political science that deal with this issue often. They are often studying a small number of countries and have developed systematic and rigorous methods for casual inference that are appropriate for this challenge.

For example, Qualitative Comparative Analysis looks at configurations of factors that have been associated with particular outcomes. Process tracing looks at the range of evidence available to conduct empirical tests of whether an intervention has been (a) necessary and (b) sufficient to have produced the observed result. The International Initiative for Impact Evaluation is producing a paper specifically looking at methods of small end studies that will talk about some of these methods in more detail.

Resources:

  • Guidance Note 1: Introduction to Impact Evaluation, Section 9: "Explaining to What Extent Observed Results Have Been Produced by the Intervention," pp. 9-12.
  • Ragin, C. (2008) "What is Qualitative Comparative Analysis?" Available here.
  • Collier, D. (2010) "Process Tracing: Introduction and Exercises." Available here.  

 

10. You said impact evaluation can be particularly useful for join interventions. Can you please elaborate?

A good impact evaluation can help partners to:

  • Clarify their separate and joint contributions to the intervention, with thoughtful program theory that explains how their joint work adds value;
  • Identify and negotiate differences in terms of criteria and standards for judging the success of the intervention; and
  • Examine how well their partnership is working.

Through this process, partners can improve the way they work together

Resources:

  • OECD/DAC (2006). "Guidance for Managing Joint Evaluations." Available here.
  • The United Nations Evaluation Group (UNEG) is currently developing guidance for multi-agency interventions, which addresses these issues.(Some information about this project is available here.)

 

11. You mentioned randomized control trials (RCTs) briefly. What is your view on the recent drive towards RCTs? Do you think they are the "gold standard" as advocated by many, or have we entered an era of methodological fundamentalism where RCTs drive out all other approaches to impact evaluation?

I think there is a middle ground between these two options. There is increasing recognition that RCTs can be a useful research design in particular circumstances, and provide useful information when used appropriately, but they are not always feasible or appropriate. Referring to them as ‘the gold standard’ is not really helpful, as it can divert attention from the other aspects of quality in impact evaluation, including valid measurement, and addressing other threats to validity. There are also some indications that, when evaluations are conducted carefully, observational data can also produce unbiased estimates (Hansen et al, 2011).

Resources:

  • Hansen, H., Klejntrup, N., and Andersen, O. (2011). "A Comparison of Model-Based and Design-Based Impact Evaluations of Interventions in Developing Countries.  FOI Working Paper." Copenhagen: Institute of Food and Resource Economics, University of Copenhagen.
  • Patton, M. Q. (2011) "The Debate About Randomized Controls in Evaluation: The Gold Standard Question." Copenhagen. 
  • Ravaillon, M. (2009) "Should the Randomistas Rule?" Economists’ Voice., February, 2009: pp. 1-5.

 

12. If a project has ended, how can we assess the impact of that project?

It’s much more difficult if the impact evaluation is not started until after the project has ended as the chance to plan and collect data before and during the project has been lost.  But this is the situation that sometimes occurs. The book “Real World Evaluation” sets out some strategies for doing evaluation under these circumstances, including reconstructing baselines, and using other data that have been collected for other purposes.

Resources:

  • Bamberger, M. Rugh, J. and Mabry, L. (2011) Real World Evaluation. (2nd edition). Beverly Hills, CA: Sage Publications.  A condensed summary is available here

 

13. When you are engaged in the evaluation, how much of your time is educative in nature?

This depends on my role in the evaluation. Some organizations have well-established learning structures and processes and my role is to gather, analyse and report some data that they can then work on. Other times I am actively engaged in working with people individually, in small groups and in large groups to share their tacit knowledge, to learn new concepts and facts, and to consider the implications of these for their policy and practice.

 

14. Should donors request that implementing partners include impact evaluation when proposals for projects or programs are being developed, or should donor agencies decide when impact evaluation should be done?

Ideally there is discussion and negotiation between partners about which interventions are the focus of impact evaluation, when and how. These discussions can identify how best to focus evaluation resources across different types of evaluation and across different interventions to get the most useful information. It is sometimes more effective to concentrate resources on doing a smaller number of impact evaluations that are high quality, rather than doing lots of impact evaluations that lack the resources to be sufficiently comprehensive or rigorous.

 

 

Webinar 2: Linking Monitoring and Evaluation to Impact Evaluation

1. Can we establish attribution only through an impact evaluation or also through other types of evaluation (e.g. mid-term, formative, process, final, etc.)? If we can establish impact through other types of evaluation, what then is the difference from those and an impact evaluation (is it related to the time when an impact evaluation occurs, or to the types of questions, or to methodology)?

My first reaction is to advise against getting overly restricted by definitions and labels, and to bear in mind what impact evaluation is all about. Impact evaluation is a form of evaluation, and is focused specifically on how changes come about. It need not necessarily be completely discrete from other forms of evaluation. Indeed, as the Guidance Note emphasizes, the more it is linked to other forms of monitoring and evaluation, the more likely it will be meaningful and useful.

If your evaluation is attempting to establish attribution, that is using systematic, credible, evidence-based approaches to identify the effects resulting from an intervention, in particular with respect to meaningful changes in people’s lives (or, perhaps, to the community or to the environment), then you are doing impact evaluation. “Impact evaluation” may or may not be compatible with other forms of evaluation. Of the examples mentioned in the question, “formative”, “mid-term”, and “final” refer to when evaluation of some form is undertaken. Formative and mid-term evaluations are essentially the same thing, evaluation undertaken while a project is in process with a major aim of providing guidance for the remainder of the project. But for long-running programs without a formal end date (or with renewal/continuation very likely), “final” (often referred to as “summative”) evaluation may refer just to a particular funding period.

A common error is to attempt to establish impact too soon (as the Guidance Note and the webinar presentation emphasized, the theory of change can help you identify when it would be appropriate as ask questions about impact). Sometimes it may be premature to expect impact to occur even by the formal end of a project, particularly if its duration is short. Nevertheless, using the theory of change as a guide, it should be possible to identify if the program or project is taking an outcome-oriented approach and to identify intermediate outcomes that may suggest if it is moving towards the creation of impact.

Nevertheless, it is important to realize that there are somewhat different views of what constitutes “impact evaluation”. For example, there are those who would assert that it normally would be carried out as a separate research study, making use of randomized or quasi-experimental designs. While this approach does have its advocates, the evaluation community generally acknowledges that there can be a variety of ways in which impact evaluation can be undertaken.

Resources:

  • The most recent NONIE Conference (Network of Networks on Impact Evaluation) held just after this webinar (19-20 April 2012, see http://www.nonie2012.org/ for more information) included discussions about how mixed methods can be used for impact evaluation.
  • These themes are also discussed in Guidance Note 1: Introduction to Impact Evaluation.  
  • Picciotto, R. “Experimentalism and development evaluation: Will the bubble burst?” Evaluation. Vol. 18 (2). Pp. 213-229. Guidance Notes 1 (What is impact evaluation) and Guidance Note 3 (mixed methods) also both address these and related considerations.

 

2. You mentioned a common mistake is to do an impact evaluation before M&E is up and running. Please elaborate.

In order to be able to undertake meaningful impact evaluation, quite a few conditions must be in place first. For starters, it is essential to be very clear about what “the intervention” really is. This rarely is as initially described or intended. Responsive programs should be making changes with respect to feedback (including from preliminary evaluation), needs, and other factors. It is necessary to give the program or project an opportunity to get established, and to become stable. And it is essential to be able to identify at what stage in the life cycle of an intervention it would make sense to expect impact to occur.

As suggested in a response to a somewhat different question above, a common error is to undertake impact evaluation too early. The Guidance Note and the webinar presentation emphasized, as well as responses to other questions above, have underscored the role of the theory of change in helping to identify what types of evaluation questions can be addressed at given points in the project lifecycle. M&E can provide information about how the initiative is rolling out, as well as baseline data and other information that may be prerequisites to meaningful impact evaluation. There are other ways, as discussed in the Guidance Note, by which M&E can contribute to impact evaluation – such as indicating when it may or may not make sense to undertake an impact evaluation, and to help in establishing priorities for where impact evaluation might be most useful. The second webinar associated with this guidance note provides examples of the latter.

 

3. How can you forecast whether or not an impact evaluation may be warranted when you are developing a program/in the proposal development stage?

This is where developing a theory of change model in the preliminary stages can be very helpful. Something that should be part of a program proposal is some evidence of what it is already trying to achieve and some review of literature. If it’s something that’s been done in other places and you are just replicating it, maybe you don’t really need IE; maybe you need evaluation, which is less intrusive. If you are trying to do something very different and it’s not clear if an impact is going to follow from what you’re doing, then it’s going to be more worthwhile.

What kind of impact do you want to look at and what is the life span of your program proposal? If you’re looking at preventative programs, like reducing smoking, you’re not going to see any impact for a number of years but you can identify some other intermediary indicators or outcomes that are documented in the research literature that you can draw upon.

 

4. NGOs often attempt to conduct impact evaluation at an ex-post stage (i.e. not planned as a prospective evaluation design). In these cases, how can program M&E be used to contribute to impact evaluation?

NGOs are not the only ones who do this. In fact, I work with organizations of all sizes and in many respects I find that NGOs are often better at thinking about what kind of data they want.

Ideally you want to try to anticipate this and a good M&E program in an organization might want to think about, in advance, as programs are being planned, what kinds of questions might come up later and try to build in, if possible, some baseline data. This isn’t always possible. There is an excellent book on RealWorld Evaluation and this deals specifically with how to reconstruct counterfactuals or baselines. The more M&E data you have, the more likely you are to be able to summon this data. Monitoring and evaluation is often done for purposes other than looking at outcomes and impact evaluation but some of the information might still be relevant to impact evaluation. Another common problem is that the data often looks overall at everyone, instead of trying to look at specific subgroups – like men and women, for example. Very often a program that works well for one group of people may actually harm another group, so the more you can anticipate this, the better off you are.

There may be some simple questions you can ask that might not give you complete confidence, but can give you more confidence than you have now. But evaluation is different than research; it doesn’t have to be perfect. And we’re always making decisions and actions that are not perfect.  

 

5. Getting staff at the community level into the periodical practice of M&E is quite complicated. How can organizations encourage their staff to do M&E?

The simple answer is to give them a reason why it’s important to them and what they can get out of it. M&E is very often imposed upon people, so you need to engage them. What information will allow them to better serve the people they are working with? Evaluation can be very helpful to staff and you need to verify why it’s helpful.

It is also helpful to provide recognition and feedback to staff, which cost almost nothing. You can provide feedback electronically very easily. At a minimum, it shows gratitude for what they have done and for sending you data. 

 

6. "Generalization" of impact evaluation findings - is it a must? Or can you imagine cases where stand-alone impact evaluation findings are valuable as they are?

Without external validity, or the ability to generalize to other situations, settings, times, or contexts, impact evaluation has no value. Impact evaluation, by definition, assesses what has happened in the past. But what has happened in the past is no guarantee that the same thing will happen in the future, without at least some understanding of the variable that have contributed to impact. And any future intervention, even if it is the “same” one carrying on, will always be different in some ways from what took place before.

For example, consider an impact evaluation that identified positive changes arising from use of a particular curriculum approach. But suppose the intervention was applied in classrooms with yellow walls. Would the same effects take place in rooms with blue walls? This may appear trivial. But sometimes seemingly trivial factors, or factors that were not considered can make a significant difference to whether or not an intervention “works” or not. Would the same curriculum be effective in situations where the classrooms are arranged differently, or where there are not actual classrooms. Would the “same” interaction be just as effective with different teachers? With students with somewhat different backgrounds?

For these and other reasons, there all-too-frequently is disillusionment with demonstration or pilot projects. Something may “work” in a demonstration setting (invariably with many factors that would be difficult to replicate in different situations or even in the same situation afterwards), but then fall flat when “scaled up”, or even continued in the same place in essentially the same way. Often, it is the initial enthusiasm of those associated with a pilot or trial that may play a greater role than the actual intervention. In order to be able to interpret and to apply findings from an impact evaluation, it is essential to understand the circumstances and context that may have interacted with impact. To take another example, provision of latrines in refugee camps may contribute to better hygiene, health, and mortality. But if women feel at risk of being attacked, they may not make use of these latrines. Without being aware of factors such as these, essential to be able to generalize findings of an impact evaluation to subsequent actions, “stand-alone” impact evaluations will have no value.

 

7. Should impact evaluation always be done by independent entities/an external organization to ensure neutrality/objectivity? 

There are many ways in which neutrality, objectivity, credibility, and appropriate methodological rigor can be established. Using an external organization may be one way. But external organizations may also come with their own preconceptions. And if the research team is overly remote, it may not fully understand the nature of the actual intervention. As the Guidance Note and the webinar presentation emphasized, this represents one of the ways in which linkages between M&E and impact evaluation can be very helpful. Use of reference groups and expert peer reviews represent techniques that can also help to ensure appropriate methodological rigor and objectivity.

Thus there is no reason, in principle, why impact evaluation must always be conducted by independent entities. There may, however, be practical reasons for this. For example, impact evaluation typically is more costly and demanding than for other forms of evaluation and sometimes may require specialized forms of expertise. Some forms of impact evaluation, in particular those using very sophisticated research designs, can require dedicated budgets up to (or even exceeding) $1 million. It can overwhelm and distract a small agency to attempt to integrate large-scale impact evaluations with its other work. Also, some funders insist upon an impact evaluation being carried out by an entity independent of the NGO.

Resources:

 

8. Does an impact evaluation always have to be about change at the people’s level? What about the impact of policy change?

Yes, you can do impact evaluation around policy change. The further along you go, the more meaningful it’s going to be. For example, if you are advocating for legislation or policy changes for food standards, you have to know what standards have been put in place and if there is already a policy, but then you want to go a couple of steps beyond that and determine if these policies have actually been acted on. Perhaps they have been implemented but they haven’t had any impact. For example, food standards haven’t really led to any improvements in food safety and could have perverse effects because too much attention is being paid to the standards and not to other things.

You may not want to just look at the far-ended results but look at the quality of people’s lives in all respects. You want to look at the longest time frame you can, and consider the timing. You are not just looking at the outputs of what’s taking place but recognizing the potential unintended effects, which are very common in the policy area.

 

Webinar 3: Introduction to Mixed Methods in Impact Evaluation

1. Is it the use of quantitative and qualitative in one evaluation makes it mixed method? Or, is it using different methods irrespective of whether the method is quantitative or not?

The term mixed methods is normally used to refer to studies that combine quantitative and qualitative methods for research design, data collection and analysis.  When several different methods, all of which are quantitative or all of which are qualitative, the term “multiple methods” is more commonly used.

 

2. Often surveys include comment boxes to substantiate responses provided in close ended questions. These comments contain info that would have been collected using focus discussion or any other qualitative methods. Is this method a mixed method?

This is true, and these can be quite effective. However, the use of these requires strict training of the interviewers as they will often tend to make little use of the boxes, particularly if they are being paid per the number of completed interviews. Also many respondents are concerned at how long the interview can take, and they often do not have an incentive to add further comments if they see that there are still a large number of pages of the survey to be completed. So this can be a mixed method approach if the interviewers are well trained. But normally the quality of the qualitative data will be better if a more in-depth interview is conducted with a sub-sample of the respondents to the quantitative survey. One reason for this is that in-depth interviews require a higher level of experience, and many enumerators conducting the quantitative survey do not have this level of experience.

 

3. You mentioned that mixed methods (MM) contribute to flexibility, especially if the context or project design changes. Does the increased reliance at that point on qualitative and/or non-experimental approaches allow you to maintain the MM status, or does it really then just become qualitative method only?

In these situations a mixed method design would continue to use a quantitative survey, but this would be combined with qualitative techniques that can adapt to the changing situation. So there is a difference between a qualitative evaluation that does not include a quantitative survey instrument and a mixed method design that uses both.

 

4. The concept of revisiting the field to collect QUAL data to explain discrepancies/unexpected QUANT findings in the triangulation stage is intriguing, but it could be difficult to stop if new discrepancies pop up during along the way. When is it "enough"?

For a simple mixed methods design the plan would usually be to only return once after the quantitative data has been analyzed and some questions that cannot be explained have been identified. An example would be when a few responses are inconsistent with general trends and the goal of returning to the field is to determine whether the “outlier” was due to a reporting error or whether it reveals a new situation that had not previously been identified in the analysis. 

In the webinar, I gave an example of a study of village water management in Indonesia where it was found that in all villages except one, it was the women who managed the water supply. However, returning to the field revealed that in this single “outlier” village the reason why men agreed to manage the water was that this was the only village where women had the opportunity to participate in very profitable dairy farming. Without the return to the field this would have been dismissed as a reporting error.

In more complex mixed methods designs, where there are various cycles of quantitative and qualitative data collection and analysis, it might be possible to return to the field several times. However, these are only a small number of evaluations, and the single return to the field is the most common practice.

 

5. What do you do when you use mixed methods and they are not consistent in terms of findings yet you have limited/no resources to go back to the field?

This is of course a very common situation. The most common approach in mixed method evaluations is to use triangulation (comparing estimates from more than one source). In most cases, inconsistencies will be discussed among the evaluation team and a decision will be made as to how to reconcile the inconsistencies without returning to the field (for reasons of cost and time). A key element of a mixed method design is the systematic use of triangulation, where the analysis is designed to systematically compare different estimates and then to make a judgment as to the best way to reconcile the different estimates. This is a much sounder design strategy than the more common situation where inconsistencies are often discovered by chance and no one is quite sure what to do.

 

6. The problem with qualitative data collection is that there are as many responses (data points) as there are respondents. How do you consolidate these widely diverse points of view into one coherent report? This is by and large the reason why many people trust QUANT over QUAL (even if the latter could provide the best information).

The challenge with qualitative data collection is to find patterns in the data and to develop a typology or classification of responses and respondents. For large studies this is often done using qualitative data analysis software packages. For smaller, or less data intense studies this can be done manually through constructing a set of two-way classification tables.

 

Resources:

  • Qualitative Data Analysis Software Packages: See Roberto Franzosi (2004), “Content Analysis,” in Hardy and Bryman (eds), Handbook of Data Analysis. Sage Publications. Wikipedia also provides some useful reviews.
  • Two-way Classification Tables: Miles and Huberman (1994) Qualitative data analysis provides many good examples of manual analysis methods.

 

 

7. There has been an increased interest in using mobile technology to map development projects (i.e. water pumps working or not), to increase reporting and lower cost. What are your thoughts on how to incorporate these mobile technologies into mixed method evaluation? What are the dangers?

These techniques have a great potential, but there are risks. These techniques work very well for responding to simple questions that either require a numerical response or use multiple choice. It is much harder to enter dialogue both because many programs only allow short responses and because most interviewers cannot write sufficiently quickly on a small screen. The points raised in my response to question 10 are even more true when using mobile technology. However, these issues do not limit the great potential of these techniques.

One challenge is to ensure the representativeness of the sample, but these issues are not very different from any sample survey. In fact, the GPS enabled device allows the supervisor to check where the interview took place (making it more difficult for an interviewer to sit in a local tea shop and fabricate an interview). GPS maps have a huge potential for all kinds of evaluations, including mixed methods.

 

 

8. Can you talk about your experience with and/or review Lot Quality Assurance Sampling (LQAS)? How (if at all) have you seen this methodology used in conjunction with qualitative methodologies? Do you see this as useful tool statistically?

Lot Quality Acceptance Sampling (LQAS) is an example of an approach designed to work with small, economical and easily administered samples that has recently been gaining in popularity (Valadez and Devkota 2002). Originally developed as an industrial quality control tool, the first application to social development was to assess the achievement of coverage targets for local health delivery systems, LQAS has now been used to assess immunization coverage, antenatal care, oral rehydration, growth monitoring, family planning, disease incidence and natural disaster relief.

It is well suited for use in mixed methods evaluations for two reasons. First, it is designed for use with small samples. Second, as it is primarily used to assess whether the coverage of a service delivery program reaches a predetermined level (for example 75 percent of target households), the statistical analysis can be combined with qualitative techniques to help explain the factors explaining why coverage did or did not achieve the targets. It is generally considered to be statistically sound as long as it is properly applied. However, its use is limited by the fact that it is only used to assess coverage.

LQAS is used to assess whether coverage benchmarks have been achieved in particular project locations.  An example of a benchmark would be ensuring that 80% of families have received oral rehydration kits and orientation or that 70 percent of farmers have received information on new seed varieties. A major advantage of this approach is that a sample of 19 (households, farmers, etc.) will normally be sufficient to estimate whether any level of benchmark coverage (above 20 percent) has been achieved for a particular target area with no greater than a 10 percent error. For most operational quality control purposes a 10 percent error is considered satisfactory, although for more rigorous evaluation studies a higher confidence level (e.g. 0.05%) might be required, in which case a larger sample would be required. The findings are very simple to analyze as the number of required positive responses is defined for any given benchmark coverage level. For example, if the target coverage level is 80 percent, then the sample of 19 families (or farms, etc.) must find at least 13 cases where the service had been satisfactorily received. If the target coverage was 60 percent, then the sample must find at least nine satisfactory cases. So in addition to the advantage of very small samples, an LQAS study is very easy for health workers, agricultural extension workers and other non-research specialists to administer and interpret.

Resources

  • The response to this question is adapted from Bamberger, M. Rugh, J. and Mabry, L. (2011) Real World EvaluationWorking under budget, time, data and political constraints (2nd edition). Beverly Hills, CA: Sage Publications. Chapter 15 pp390-91.
  • Valadez, J., & B.R. Devkota (2002) “Decentralized supervision of community health programs: Using LQAS in two districts in Southern Nepal,” in Community-based health care: Lessons from Bangladesh to Boston, Management Sciences for Health Inc.

 

9.  What are some examples of non-parametric statistical methods that are being used in qualitative data analysis in impact evaluations. Do you have any references to share?

While most qualitative evaluations use non-statistical methods for data analysis – such as content analysis, discourse analysis and case-based analysis (see Lee and Fielding 2004  and Miles and Huberman 1994  for two different perspectives) – non-parametric tests and statistical methods for the assessment of small samples are more widely used in mixed methods evaluations, where the purpose is often to generalize the findings obtained from qualitative data – often using relatively small samples. Hoyle (1999) provides a review of statistical strategies for small sample research, and texts such as Gibbons and Chakraborti (2010) provide an overview of non-parametric tests. Mixed methods frequently use statistical sampling techniques for selecting a relatively large sample for a quantitative survey, and then select a small sub-sample for qualitative research. These may be selected randomly but more frequently purposively. Small sample selection strategies provide guidelines on what will be an acceptable sample size for statistical analysis. Strategies for determining sample size for purposive samples are not so well defined statistically and judgment will often be required.

 

Resources

  • Lee and Fielding (2004) “Tools for qualitative data analysis," in Hardy and Bryman (eds.) Handbook of data analysis, Chapter 23. Sage Publications.
  • Miles and Huberman (1994) Qualitative data analysis, Sage Publications. 
  • Hoyle (1999), Statistical strategies for small sample research, Sage Publications.
  • Gibbons and Chakraborti (2010), Non-Parametric statistical inference.
     

10. Reconstructing baseline data is not only influenced by recall, but also by "social desirability bias." What can be done to minimize this bias?

One of the ways that mixed methods try to address issues such as "social desirability bias” is by using triangulation to compare estimates obtained from two or more sources. For example, interview responses could be compared with information obtained from key informants or from focus groups. In cases where the potential bias refers to the present time, observation is a useful method (comparing what the respondent says with what can be directly observed). Obviously, this cannot be done so easily when trying to reconstruct the past. Another approach is to develop a theory of change or other theoretical construct that might predict factors that are likely to influence how different groups are likely to respond. This can provide guidelines as to the possible cause and direction of the bias, so it might be possible to make adjustments in the analysis.