It is time to revise the “Gold Standard” of Impact Measurement and Evaluation Design

A goal of impact measurement is to be able to clearly identify and measure the net, positive effects that result directly from the activities of an initiative, program or social enterprise. This could make it clearer for impact investors about what they invest in, what strategies and models are most effective, and how to replicate and scale what matters.

Clearly establishing net impact and the influence of a program’s activities on an outcome has been a huge challenge in this field. A ‘gold standard’ of research and evaluation, the randomized control trail, is often held up as the standard to which we should aspire. (For example, see NESTA’s Standards of Evidence.) In a nutshell, two groups of people are chosen at random, and only one participates in new program, while the other does not. (A variation of this is where multiple groups are involved). Participants in the groups are monitored for the same outcomes and if the ‘experimental’ group has a statistically significant difference in a measured outcome, we could infer that an intervention led to a specific impact. The ‘experiment’ should also be repeated, and the same result achieved each time. Statistical significance means that we have established that the difference in the data is not due to chance.

Unfortunately, outside of a science lab and the setting of clinical drug trails, it is quite challenging to structure experimental evaluation designs because of cost and practicality. Even the ability to conduct quasi-experimental evaluation designs (that use no control group) can be quite challenging. Is there any practical, statistical alternative to understanding the contribution that specific activities make in influencing an outcome?

Let’s unwind the history of statistics for a moment. Statistics evolved along two lines: Frequentist and Bayesian traditions. Both encompass rich, philosophical traditions, but with differences in how uncertainty and probability are viewed. Frequentist inference, the statistical practice embodied by the experimental ‘gold standard’ evaluation design, involves deducing whether an effect (relationship) that is seen in the data could instead be attributed to chance using a bunch of 'tests'. The other way that statistics evolved is based on the work of Thomas Bayes, an English, Nonconformist theologian and mathematician in the 18th century, who died before he could become famous. (Encyclopedia Brittania). In Bayesian inference, uncertainty is a reflection of imperfect knowledge that can be described in degree-of-belief probability statements, and which can be improved upon with additional evidence. It is not limited to testing a single hypothesis.

It is hard to describe more about the differences without getting mathematical very quickly, but a helpful place to start to learn more about this Bayesian statistics is: Bayesian Statistics: A Beginner's Guide For some reason, Bayesian statistics is not part of most foundational courses about statistics that many people (have to) take, but is often taught more specialized classes. Hopefully this can change.

Bayesian statistics can be very helpful in understanding relationships and causality in real-life complex systems. It allows us to combine data analysis with subjective knowledge that includes evidence we already know about the system and knowledge from stakeholders. This statistical tradition also allows us to communicate the degree of uncertainty we have in results in a richer way, which is very beneficial in focusing future data collection efforts to improve our understanding. The greater computational complexity of applying Bayesian statistics has been a barrier to use, but luckily this is now mitigated by the availability of easy-to-use software packages!

Many fields, particularly in the social sciences and natural resources management, are embracing Bayesian statistics and moving beyond the exclusive use of frequentist statistics. Here are a few great papers and examples:

Office of Planning, Research and Evaluation, 2017 Research Methods Meeting. Bayesian Methods for Social Policy Research and Evaluation
Charles Michalopoulos, MDRC, “Bayesian Methods in Social Policy Evaluations” and "A Bayesian Reanalysis of Results from the Enhanced Services for the Hard-to-Employ Demonstration and Evaluation Project"
Joseph E. Beck, Kai-min Chang , Jack Mostow and Albert Corbett, Carnegie Mellon School of Computer Science. “Does help help? Introducing the Bayesian Evaluation and Assessment methodology

While I don’t think it may be practical to integrate Bayesian statistics into all evaluation and impact measurement, stepping away from the ‘gold standard’ of experimental evaluation design, in favour of applying Bayesian Statistics is long overdue. It is much better suited to understanding how complex change happens and is one of the best tools to update our subjective understanding about impact with data and evidence.

blog type:

Issues & Ideas

tags:

impact measurement