Marcus Munafò is Professor of Biological Psychology at the University of Bristol. His main research is on the neurobiological and genetic basis for tobacco and alcohol use, but he has also had a long-standing interest in the role of incentive structures in science, and their impact on research reproducibility. His scientific work on science (metascience) includes reviews and studies on statistical power, analytical flexibility, and reporting biases in various domains such as psychology, neuroscience, and genetics. On the 21st of November Marcus came to Groningen for the UMCG mini-symposium “Is science having an integrity crisis?” to discuss research reproducibility issues and potential solutions. I interviewed him for the newsletter of graduate school BCN.
[Reading time: 5-7 minutes]
Getting our house in order
Let’s start with the million dollar question: do you think science is having an integrity crisis?
I don’t think integrity is necessarily the right word. The vast majority of scientists are trying to do good work with the tools available to them and using the training they have had. I believe most problems don’t arise out of integrity issues, but because well-intentioned scientists may not appreciate how certain ways of working that are commonplace can have problematic consequences.
Perhaps a better term would be reproducibility crisis?
I’m also not sure crisis is the right word. I believe it’s better described as an opportunity to bring up to date the way in which we do science and move towards a more diverse system of rewarding outputs that could include publications, but could also include other products of our work such as datasets. People who are interested in metascience are looking at the way science functions and asking: Can we do better? The answer may be that actually what we’re doing at the moment is optimal. I don’t personally think that’s going to be the conclusion, but it might be: it’s an empirical question.
There is a lot of media coverage on integrity and replicability issues. Do you worry this might harm the public’s appreciation of science?
Naturally, there is a concern that these issues will be overstated by the media and there will be unfair critiques of science that, for example, focus on integrity issues. But ultimately, if the public loses trust in science because the research we generate isn’t robust, then that’s the fault of ourselves as scientists. The media might be shining a light on it, but actually all that means is that we need to get our house in order and do good work. Therefore, I think the media coverage is helpful in a sense, as it makes scientists sit up and pay attention.
What are currently the most promising initiatives to improve research reproducibility?
There are many individual initiatives, which I think are best captured by the overarching initiative of open science: open data, open materials, and preregistration are measures to make the whole scientific process more transparent and therefore more accountable.
Open science fits with the ethos of scientists being public servants, who are funded through public funds or charitable donations.
Moreover, open science challenges the implicit assumption many scientists have that, due to their intellectual efforts, data are somehow their own. In reality, data are owned by funders and employers. I think making scientists somehow independent of their data is ultimately healthy: if you don’t feel as invested in your data, then you won’t feel so personally criticized if someone has a problem with your interpretation of the results or if your results prove not to be robust if someone tries to replicate them.
Not everyone is as keen on the idea of releasing data they generated though.
Of course there are different perspectives and there can be exceptions. For instance, large ongoing cohort studies such as TRAILS here in Groningen or ALSPAC back in Bristol are simply too big to put online in one go. Also, you have to be careful with data protection and confidentiality. In general, however, I think open data is a good principle and an efficient use of resources.
Some scientists even worry about open data being put to malicious use…
We need to continually monitor what impact (changes in) current structures have on the behaviour of scientists, because there could be unintended consequences of everything we do.
The focus on how we do science needs to be an ongoing process rather than just a one off process.
The significance of P
What change would you like to see most?
It might seem like a small thing, but one of the first things we did in my lab was stop using the word “significant”. There are some journals that have gone further, banning p-values, but I think that’s too far; there’s nothing wrong with the p-value per se. However, I think the way it is used is often unhelpful: we tend to treat the p-value as a dichotomous – “less than or greater than .05” – rather than a continuously distributed measure of the strength of evidence against the null hypothesis.
So how would that work in practice? Take, for instance, a typical sentence from a clinical trial paper: group A performs significantly better under treatment X than group B.
That’s a good example of exactly the problem. First, you have the actual data; group A and group B will never be exactly the same in terms of what you’ve measured. Second, you have the statistic, which tells you something about whether that difference is likely to be consistent with the null hypothesis or not. By saying group A performs significantly better you’re conflating those two pieces of information. A better way of describing it might be to say something like: in line with our hypothesis, group A scored higher than group B, but there was no statistical evidence that this difference was meaningful. This forces you to incorporate your prior hypothesis into your interpretation. Moreover, it makes your inferences transparent and allows the reader to judge whether or not those are valid inferences rather than just relying entirely on the p-value.
Read Sifting the evidence—what’s wrong with significance tests? A great educational piece on the problems with significance testing.
How do reviewers respond to this?
Really positive for the most part. Sometimes you get people saying that because the p-value is .06 you can’t say anything at all about your data, but obviously there’s no meaningful difference between a p-value of .04 and .06. A larger p-value where your data are in the predicted direction is stronger evidence than a p-value of .05 when your data are in the opposite direction.
Comfort food for thought
You have written some papers that were quite earthshattering to me, for instance, a paper on power failure in most neuroimaging studies. How do you keep yourself from becoming pessimistic?
I’m neither an optimist nor a pessimist. I guess I’m kind of a pragmatist.
This is the reality of the situation: most people are trying to do good work, but the system is not functioning optimally.
Therefore, we need to try and change it, and that will be a gradual process. At least funders and journals are now starting to take these things seriously. Plus, individual researchers are talking about it, which is great. But if you think about something relatively simple as making publications open access – that project took over 20 years in the United Kingdom! A few years ago, I became much more angry at what I perceived to be inefficiencies, whereas now I’m more pragmatic about how long any changes will take to implement.
Do you have any advice for researchers going through a crisis of confidence in science?
One piece of advice is to have confidence in yourself. In my experience, many problems lie not with individual researchers, but with the mismatch between the published literature and the day-to-day reality of doing science. Another piece of practical advice is that when you’re choosing a lab to go and work in, don’t just go for the most prestigious lab; find a supportive mentor, someone who’s going to help you work in the right way. This may be in the prestigious lab, but it may not be.
A much heard advice for young researchers is that they need to plan ahead and develop their own niche as quickly as possible.
I think this is one of the subtle ways in which we create incentive structures that are well-intended, but can have unintended consequences. If you become very narrowly focused and you build your whole career on a certain theory or model, and it turns out you were mistaken, it may become very difficult to pull back from that position. But if you have multiple options available to you, it becomes easier for you to disengage and become critical of it. You can see that approach in my research: it’s a complete mixed bag of things that were interesting at the time. I gave up on some lines of work, for instance, because the tasks I was using were not robust enough. Earlier in my career I felt like there was no coherence to my research, but now things have come together in a larger programme of work that is coherent. I think moving around different disciplines can bring young researchers a lot.
Missed the symposium?
Marcus’ lecture “Scientific Ecosystems and Research Reproducibility” can be viewed online.
Related blog: Met a metascientist who made me think twice