Zoé Ziani was working on her PhD thesis in the field of behavioural psychology. She was looking at a particular paper "CGK 2014" (an abbreviation for The Contaminating Effects of Building Instrumental Ties: How Networking Can Make Us Feel Dirty). This paper is about how individuals can feel "pysically dirty" when networking. Ziani found this… hard to believe. She eventually decided to not involve this paper in her PhD thesis. I won't recount the whole story, which you can read for yourself here but the pushback from her supervisors was immense. Francesca Gino was so well known and had so many influential papers, questioning her findings seemed beyond the pale. Ziani went through several rounds of this – adjusting her thesis to have it rejected by her supervisors. Instead of removing her critique from the thesis, though, Ziani did the right thing – she followed her hunch. Her supervisors rebuked her each time she dug deeper into the paper and the data.
At first, Ziani thought that the paper was a little ambitious. Then, she became concerned that other papers had failed to replicate its effects. Then she became concerned with its methodology and theoretical foundations. She began to suspect cherry picking was used, and that the paper was not trustworthy.
Ziani included these concerns in her thesis.
As Ziani says
What should happen then (if science were, as many people like to say, “self-correcting”) is that, after a peer-review of some form, my criticism would get printed somewhere, and the field would welcome my analysis the same way it welcomes any other paper: Another brick in the wall of scientific knowledge.
Unfortunately, this was not the case. Her supervisors refused to sign off on the thesis. What was this PhD student to do? She had to cave else she would not graduate. Nevertheless, Ziani remained determined to get to the bottom of the inconsistencies. She ran a study aimed at reproducing the Gino study, but could not replicate the results. Ziani also noticed that many papers authored by Francesca Gino had similar issues. She began to suspect data fraud.
Ziani reached out to Data Colada, a blog which specialises in identifying data fraud. They did a deep investigation into the papers and eventually concluded that there was a significant amount of fraud present, which you can read about in these four blog posts.
So, who is Francesca Gino? She's one of the highest paid academics in Harvard university, with most sources agreeing that her annual salary is something like $1million. She's internationally famous in the field of behavioural psychology. She's published many best selling books (which are all based off her allegedly fraudulent papers, by the way). She receives many invites to give talks, including Ted Talks, and her speaking fees are $50,000-$100,000. She consults with some of the largest organisations, businesses and governments in the world (again, based on her allegedly fraudulent papers).
Harvard has placed her on administrative leave and performed its own investigation into the alleged data fraud. Many journals have now retracted her papers. Gino is herself suing Harvard and Data Colada for $25million.
This case is still ongoing, so I won't comment on the legality of it. Nor will I even say whether Gino did in fact commit data fraud – her lawsuit obviously claims the opposite. Instead, I want to ask a different question: how did the peer review process not catch this? How did it take nine years for anyone to even bother looking at the raw data of Gino's papers? Especially given how she has a history of producing surprising results. Not so surprising that the entire field of behavioural psychology accepts the findings without looking at the data? How did it take eight years for anyone to bother running a replication of this surprising result?
Let's step back for a moment.
In 1942 Susanne K. Langer published Philosophy in a New Key: A Study in the Symbolism of Reason, Rite and Art. In it she makes the argument that eras in thought are defined by the kinds of questions asked, rather than the answers. For example, if I ask you "how do you think the world was made?" it carries the implicit assumption the world was indeed made. A true breakthrough would be the idea the world was not made, but always was.
Langer identifies several major epochs: the Greeks, the Christians, the Enlightenment. Each defined by some key set of questions, assumptions or techniques. The Enlightenment was defined by its idea of an external world, observed to us by our senses. A philosophical epoch comes to an end "with the exhaustion of its motive concepts". As Langer explains
An answer once propounded wins a certain number of adherents who subscribe to it despite the fact that other people have shown conclusively how wrong or inadequate it is; since its rival solutions suffer from the same defect, a choice among them really rests on temperamental grounds. They are not intellectual discoveries, like good answers to appropriate questions, but doctrines. At this point philosophy becomes academic; its watchword henceforth is Refutation, its life is argument rather than private thinking, fair-mindedness is deemed more important than single-mindedness, and the whole center of gravity shifts from actual philosophical isues to peripheral subjects — methodology, mental progress, the philosopher's place in society, and apologetics.
The eclectic period in Greco-Roman philosophy was just such a tag-end of an inspired epoch. People took sides on old questions instead of carrying suggested ideas on to their further implications. They sought a reasoned belief, not new things to think about. Doctrines seemed to lie around all ready made, waiting to be adopted or rejected, or perhaps dissected and recombined in novel aggregates. The consolations of philosophy were more in the spirit of that time than the disturbing whispers of a Socratic daemon.
Remember that Langer was writing in 1942. I think that since the end of WWII we have been living in a new philosophical era I would call Scientism. Scientism is evolved from the logical positivists before, from the idea of making the empirical experiment the king of knowledge and rejecting anything outside of it. The peer review has become the gold standard.
Scientism produced amazing results. Rapid advancements in physics, chemistry and engineering can be seen in the last century. Computers and computer science came to almost totally dominate the economy. The average lives of average people have never been so radically transformed as since the advent of Scientism.
This is relevant because I think we are already near the end of this epoch. Most hard sciences are now hardly producing new, interesting results. Physics is the only field still doing this regularly, but even still, the big question of fusing the standard model with gravity remains unanswered yet (I suspect when this is solved a new explosion in physics knowledge will likely occur, as was caused by Einstein's discovery of relativity). Even in the world of business, few companies are producing actually interesting original software, it's either buggier rehashes of older services or gimmicks that tech bros who have never actually programmed about can feel good about posting on LinkedIn. The only field of computer science which seems to be producing new interesting results is AI research, but even that, once one gets past the gimmick of it all, has surprisingly little impact relative to its promise or cost.
Science is now "academic" as Langer says. There are set positions of orthodoxy within the academy and these are defended not with actual science or reason but by slogans: "trust the experts", "follow the science", "peer reviewed papers find" and so on.
As we've seen, data fraud often slips through. I covered the case of Francesca Gino, but she isn't alone. Khalid Shah, Brian Wansink, Marc Tessier-Lavigne (the former president of Stamford), Jan Hendrik Schön or Joachim Boldt all come to mind as famous cases of extreme data fraud and manipulation.
That's just instances of gross data fraud. What about the broader reproducibility crisis? How is the peer review process not managing this? Does it not ensure papers are high quality, reproducible and trustworthy? Unfortunately, not.
Let's change gears and talk about peer review itself. The fantastic article The Rise and Fall of Peer Review by Adam Mastroianni discusses his experiences with peer review. The first thing to note is how relatively new peer review is. It was not present till after WWII, and still uncommon until around the 60s. Since the 60s, it's become universal, a requirement for being published.
Does peer review actually meet its goals? Sadly, not, as we have seen. There are even studies studying the peer review process. Authors will write papers and leave huge deliberate mistakes in them: mistakes like the trial wasn't random, the data doesn't support the conclusion and so on. Unfortunately, rates of catching these errors are as low as 25%. How much lower is it for papers that are trying to hide their manipulation?
You might (as many do) feel a little uncomfortable. Maybe you think we need peer review, because if we don't people might go and say untrue things. They might say "misinformation" and that needs to be stamped out. But if you think misinformation is scary: right now we publish vast quantities of misinformation all the time and put a little stamp on it saying it's actually very accurate and can be definitely trusted. That's way scarier than some junk science being published on some junk blog.
In the end Mastroianni starts uploading papers to the internet by himself, with no peer review process. And guess what, spontaneously his paper becomes popular - it has impact, it has value.
This lack of impact was one of Andrey Churkin's main complaints. Churkin put a lot of effort into a (he claims - I'm not in that field at all) groundbreaking paper to only get a tiny number of citations. What really upset Churkin the most though was when applying for a promotion, the only questions the department was asking was about his citations. They weren't asking about his skills as a researcher: his knowledge, his insight, his skills with code or data visualisation. Just citations.
This leaves us in a tough spot. We now have massive institutions which work as the gatekeepers of knowledge, yet most of that knowledge is likely suspect, if not outright fraudulent. Yet, despite everyone knowing how much issue lies in this system it remains the only source of legitimacy we have.
What are the solutions to these problems? I have a few ideas.
We need to end the need for academics to have so many citations. Unless they are generating citations, then their academic career is unlikely to advance. This is called the "publish or perish" culture. This has a host of issues, which we will look at.
In an effort to publish papers academics are encouraged to engage in p-hacking, manipulation or even outright data fraud to ensure a paper is published. Since you need to be generating significant numbers of citations with each paper, this encourages finding novel results. However, novel results are hard to come by. This encourages data fraud, p-hacking and manipulation. Creating systems that actually encourage bad behaviour is the worst thing we can do.
Even if they don't engage in any kind of fraud there's still often poor results. Academics are not permitted the time needed to actually research something very deeply and come up with something new, original, exciting, interesting (in other words actually worth publishing). Instead, what we get are a bunch of papers that often don't say much about anything, rehash old results, meander or make no clear point. We're missing out on so much good science! At the very least, we're overborne with low quality research which makes finding the good stuff challenging.
Another consequence of the "publish or perish" culture is most academics do not get enough time to read significant amounts of literature. They have to spend all of their time working on the next paper. It is only through reading, discussing, and reviewing other work can scientific knowledge proliferate. This also ties into papers being low quality. Papers are low quality, so people don't take the time to read them, which means to get citations you need to publish more papers, which encourages lower quality. It's all a vicious cycle.
Many journals don't publish null results. I don't know if this is because of policy, or tradition, or scientists feel they will look bad if they get null results but we need to be publishing null results. A null result is when the experiment doesn't prove the hypothesis. This is usually going to be that there is no relationship between the variables. It has been found that 91.5% of papers in psychology confirmed the effects being looked for. This is unlikely and suspicious. It could be p-hacking to force an effect, or it might be improper reporting. For example, the researchers started looking for one effect, but found another. This is okay, but you should still report it as a null result, and then conduct another study to focus in on the effect you did find.
We need to become more comfortable with "alternative" sources of information. Blogs and forums are unofficially already likely a greater source of valuable information than so-called "real" sources. How many academics studying Kalman filters have turned to Stack Overflow to debug their issue without citing it? Or blogs like Towards Data Science on Medium have become more or less the de facto way by which data science is communicated and implemented by programmers across the world. No scientific programmer is doing their daily job without at least skimming a Towards Data Science post here or there.
But, are these "alternative" sources of information not dangerous? They aren't "official"! They do not have the stamp of "peer review"! What if the experiment in them was made up, the information inaccurate or bias? Guess what – experiments in peer review articles are constantly made up, fraudulent, manipulated without being caught. Blogs and forums are often better because you know they might be inaccurate. Check them against other sources, balance them, try them out yourself and so on. In other words, work scientifically in a way which peer review has killed.
In computer science, we have open source software which has been very successful. Open source allows for as many eyes on something as possible and that is what ensures quality. Open source software on GitHub often has hundreds of contributors and thousands (if not in some cases millions) of users. This helps identify and fix issues and also builds trust. Many research papers don't provide access to the data, and it's not a requirement. Few journals even require the data be accessible "on request". This may have contributed to the length of time it took to observe the issues in the Francesca Gino case. Certainly, if the data was available more readily it would at least be somewhat more likely that someone would have looked at it.
This goes even further. Gino provided her data in the form of Microsoft Excel spreadsheets. I think in 2024 this is pretty unacceptable. Microsoft Excel, Google Sheets and other proprietary data storage solutions are not suitable, due to the closed source nature. Even worse, these aren't "simple" files - a Microsoft Excel file is actually a ZIP folder containing a bunch of XML files. You should be providing data in something like a CSV file. Data analysis should not be done in something like Microsoft Excel but rather in an open source programming language like Python, R, Julia (popular choices with scientists) but also Rust, Swift, C, C++... any easily compiled and run language will do. Matlab is unacceptable since it is proprietary. Graphs, charts, diagrams and so on should be made in Pyton, GraphViz, LaTeX and so forth.
On my website, I have a Kalman filter course. As part of it, I show you how to build some simple filters. All these filters are open source on GitHub. All of them show you how to simulate some noisy data then extract the underlying signal with a Kalman filter. All of the graphs I use in the course, as well as the simulations and filters themselves are open source - if you think there's something wrong with any of them you can run them at your own convenience and inspect the result.
A journal that would heavily discourage data fraud would have the following requirements.
- If any experiment was conducted, all of the data must be made available in an open source text format (CSV preferred)
- All data analysis must be done in a non-proprietary programming language from the raw data files and made available
- Any graphs, charts or diagrams produced must be made in code (or some other open source software) and included
- If a simulation was done, the simulation code must be included
- The paper must be a PDF, produced in LaTeX (or other similar open source markdown tool). The markdown used to generate the PDF must be made available.
Don't force empirical experiment onto every field. In physics, there's an axiom that particles are all the same. So, if I perform an experiment on electrons, I don't need to worry that I am using "different" electrons - I can be confident they will all behave according to the same physical laws. This is not possible in psychology. Not every person is the same, and even the "same" person is changed after performing the experiment once. In this sense, many experiments in psychology have fundamental methodological issues. Returning to the ideas of Langer, symbolic research in psychology is shunned for not being scientific, since it has no empirical basis. But this is just as, if not more, scientific than forced experimentation with flawed methodology. For example, in a previous article of mine I discussed how everyone missed the big symbolic idea of Freud's work.
Since we try to force everything down this rail road of empirical experiment, we often miss out on this kind of symbolic work which might be valuable too. Furthermore, attempting to force empirical results from something non-empirical might be another reason academics feel the need to commit data fraud. Perhaps none of the experiments showed anything. If that's the case but you need a result, there's only one thing left to do.
Very few papers are aimed at reproducing existing work. This is flawed. Science only works when there is a lot of reproduction of a result. Not only the original researcher reproducing the experiment many times, but also other researchers reproducing the whole experiment. Yet, it took eight years for anyone to bother trying to reproduce CGK 2014. I think this comes from academic careers only being continued if they produce novel results. This is a mistake which encourages poor behaviour whilst also supressing one of the cornerstones of the scientific method - reproduction. Allowing academics to advance their career off or more or less just reproducing results is fine - actually, totally acceptable and good. There's nothing wrong (or, should be nothing wrong) with publishing a paper which just reproduces another paper and then states how reproducible it is. This is really the only way out of the reproducibility crisis.
Since the advent of peer review, papers have had more and more dull language. I'm not sure why - but for some reason as a species we decided that something is not "professional" or "official" unless it is incomprehensible. Personally, I think a lot of the unclear language is a deliberate choice by authors to obfuscate the actual workings of the paper as much as possible to ensure nobody notices nothing of value is actually said. Mastroianni wrote his paper in clear, simple, language. This had several effects. He had nothing to "hide behind" - all the ideas needed to laid out. This demonstrates the actual value of the paper. Also, it made the paper much more accessible to people outside the immediate field.
How can we actually implement all of this though? I've no idea. It's a cultural change. Just as we are now caught up in the culture of Scientism, hopefully we can soon get caught up in a new culture which has a keen focus on open access science, on reproduction, on quality over quantity. Untill then, I encourage anyone who cares to do what they can - especially encourage good practice.