Focusing on Average Treatment Impacts May Underestimate Program Impacts
This post is based on the article "Reconsidering findings of “no effects” in randomized control trials: Modeling differences in treatment impacts" by Brad Chaney. The article appears online at the American Journal of Evaluation.
When impacts vary from one subgroup to another, then focusing on average treatment effects (ATEs) may underestimate the impacts, according to a recent article in the American Journal of Evaluation. Using simulated data, author Bradford Chaney, a senior study director at Westat, develops the following six propositions.
- Attention to subgroups can increase statistical power. This is especially a factor when the treatment being examined is not the only source of change—e.g., when there also is change due to maturation, outside support, and/or measurement error. Varying impact models can perform better than ATEs at modeling at least some of that variation, creating a potential for increased statistical power.
- Modeling the nature of the relationships between the variables is important. Treating participant characteristics as isolated variables rather than in interaction with the treatment variable can lower the quality of the statistical estimates.
The author also gives several reasons to expect variations among subgroups within the field of education.
- Examining differences in impacts is helpful even with errors in the specification. Even if the model is missing key variables or includes irrelevant variables, a partially correct specification can be more effective than the use of ATEs.
- The use of varying impact models can outperform the use of ATEs even if the research design is restricted to specific subgroups of interest. For example, examining ATEs within a group that is limited to males does not perform as well as varying impact models.
- Varying impact models are less vulnerable than ATEs to unrepresentative samples. Averages that are based on unrepresentative samples may give misleading results when applied to the entire nation. By contrast, varying impact models can measure the relationships between variables even when the sample is unrepresentative.
- Varying impact models are less vulnerable than ATEs to biased samples. It is the presence of variation that is critical to estimating treatment impacts accurately, and not necessarily the distribution of the data—i.e., as long as there are some data from the extremes, a varying impact model can produce accurate estimates even if the data disproportionately favor one side. By contrast, through assuming the variation is equally distributed, the use of ATEs fails to make full use of the variation that is present, leading to less reliable results.
The author also gives several reasons to expect variations among subgroups within the field of education.
- Variations in participants’ level of treatment/dosage. The education system often has rules built on the assumption that dosage matters. States require public schools to offer a certain minimum number of days of instruction, and graduation from high school or college requires the earning of a minimum number of credits. Yet dosage amounts can vary radically in evaluations.
- Factors affecting dosage. These might include internal factors (such as motivation, persistence, and self-efficacy) and external factors (such as transportation issues and constraints on time, including school or job requirements that interfere with full participation) and how they interact with program features such as frequency, timing, and location. For example, in an evaluation of Upward Bound, many students stopped participating because Upward Bound interfered with their ability to hold summer jobs.
- Factors affecting people’s internal responses to a program. Factors such as motivation, self-efficacy, ambition or future goals, personal interest (e.g., in the topic being taught), relationships with people operating the program, and feelings of inclusion all could affect how people respond to a program. For example, two students might attend the same class, but one might listen intently and work earnestly to master the material, while another might pay little attention.
- Factors affecting people’s ability to benefit from a program. A person’s ability to benefit may vary based on factors such as resources (e.g., access to computers or tools), internal qualities (such as intelligence, prior knowledge, and skill level), disabilities that are directly relevant to performance of required tasks, and outside support (such as from peers, family, community support, and community resources).
- Different baselines. The success of an intervention may depend in part on how far the subject is from the goal at the start. Depending on a teacher’s approach, some material may be too difficult for lower achievers or too easy for higher achievers, keeping one or both groups from benefiting fully from the teacher’s instruction.
Chaney concludes that a focus on average treatment effects can lead to using misspecified models, effectively asking the wrong research question. Instead, both theoretical examination and empirical testing are required to find a well-specified model, which is needed for accurate estimates.