An external examiner put the following question to my student last week: "how do you know your results aren't just caused by the Hawthorne effect?" This is a perfectly reasonable question, but I spent quite a lot of time trying to work out what a good answer would be. You could legitimately ask it of almost any study I have conducted when developing new technology for children.
The Hawthorne effect refers to the idea that observing people's behaviour changes that behaviour. It's a kind of observer effect, related to expectations, and a second cousin of the placebo effect. In the 1920s and 30s, a psychologist conducted a series of studies about the working conditions in a factory in a suburb of Chicago called Hawthorne. If he decreased the light levels, workers' performance increased. But if he decreased the light, their performance increased too. He concluded from this that the increases in performance were due to the workers being observed, rather than the physical changes to their work environment. There was a threshold, because performance would obviously drop if the light levels were so low that no-one could see to work. By analogy, the examiner wanted to know "how do you know that the young people in your study changed their career aspirations because of the educational robotics intervention they took part in? Could it be that they were simply inspired by the presence of an enthusiastic expert from a university who was interested in them?"
The answer that it is entirely possible that the young people's interests changed as a result of interactions with an enthusiastic researcher. In fact, in this study of an existing educational robotics after school programme, a purpose of the programme was to connect young people with mentors from the technology industry to help them find jobs afterwards. Opportunities to work with enthusiastic technology experts was an active ingredient of the intervention. There might be a stronger effect for a mentor who is also a researcher from a university in another country but my guess is that wouldn't be very large. A mean examiner could pounce on such a wooly guesstimate by asking: "just how small would the effect be, and how do you know?".
With the luxury of time and google scholar open in front of me, I can find an answer to that question. A 2014 meta-analysis of the Hawthorne effect reviewed 19 studies to find an overall odds ratio of 1.17 (95%CI: 1.06, 1.30) [1]. So one might expect an effect to the lower end of that confidence interval. Without getting too hung up on how to interpret an odds ratio (see [2] if you care), this is pretty small (similar to finding that Cohen's d < 0.2). The meta-analysis concluded that there is evidence of some kind of observer effect, but the evidence is all over the place. The different studies included in the review had different ways of defining and measuring the effect, and we don't know much about the exact mechanism by which is mean to work, and how it might vary in different circumstances. The authors of the meta-analysis recommend that some better theory is required to guide new experimental work in this area. Don't worry - it would be unreasonable of an examiner to expect an answer like that in the heat of the moment!
The examiner's question is about how a researcher would know whether the Hawthorne effect is present. This is rather hard to tell with this sort of qualitative study. The original Hawthorne studies were quite simple. The interventions which manipulate light were time bounded and reversible: higher light levels today impact your performance today but presumably not how you perform next week if you the light has gone back to normal. (Unless there is some kind of cumulative effect by which you perform best after working for 5 days in a row with optimum light levels). So it is relatively easy to design a study to pick up quantitative evidence by looking at differences between treatment and control groups in a one off experimental setup which controls a single variable in this case by increasing and then decreasing it. However, in education, it is harder to switch effects on and off in this way - it is more likely that an intervention would need to run for sometime before becoming effective, and then there would be a slow decay (or no change) once the intervention has finished. In a study about methods of teaching people to read, there is presumably not a slow decay once the intervention has finished because people don't typically forget how to read once they are secure with reading.In the technology career example, it would be extremely unlikely to change young peoples' career aspirations in a single afternoon, whereas classes over the course of a year in positive circumstances might gradually change persuade people to consider technology careers, until the classes stop and life circumstances gradually intervene.
I suppose warning signs from qualitative which might indicate that the Hawthorne effect is at play could be the young people mentioning that the researcher personally inspired them, or that being in a study made them reconsider their lives. Or perhaps one could compare the outcomes for other young people who took part in the robot classes at another centre but who did not have exposure to the researcher. The problem there would be that variation would be introduced by other differences in this comparison.
It is hard to distinguish the Hawthorne effect from a set of related factors. It is quite common across many disciplines for the effects of positive pilot studies to diminish when they are rolled out at scale. Let's think about it within classroom interventions as an example. Suppose there was a classroom intervention which was originally delivered by a subject expert researcher, or by a teacher who was closely collaborating with such as researcher. When it comes to rolling it out across many classrooms, the researcher no longer delivers all the lessons. It's the class teachers who now do it, and they don't have the specialist knowledge of the researchers and in addition, each teacher has less support from the research team. There's more variability in delivering the intervention as intended because the teachers have different backgrounds and a million other things to do as well as this intervention. The expectancy effect is another factor here. There are studies which demonstrate that teachers' behaviour towards students varies according to their expectations about their students' abilities at the outset. It becomes a self fulfilling prophecy: those who are labelled as the brightest students will always seem bright to the teachers, and the lowest test performers will seem like lost causes.
In short, if an external examiner ever challenges you about the Hawthorne effect, here's what you could say. It is, of course, something I am aware of, but a recent meta-analysis indicates the concept is not well defined, and the effects are small. From my evidence, I saw no red flags which would suggest it was the case, but in future studies when the study is rolled out across more classrooms (or equivalent) I'll be alert for the effects becoming smaller or more variable as more practitioners are involved.
The story ended happily because my student now has a PhD. And I have another tricky question up my sleeve for next time I have to examine a thesis.
[1] McCambridge, J., Witton, J., & Elbourne, D. R. (2014). Systematic review of the Hawthorne effect: New concepts are needed to study research participation effects. Journal of Clinical Epidemiology, 67(3), 267–277. https://doi.org/10.1016/j.jclinepi.2013.08.015
[2] Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics: Simulation and Computation, 39(4), 860–864. https://doi.org/10.1080/03610911003650383
Recent Comments