The Psi Debate — Evidence and Counter-Evidence

The Precognition Puzzle

Can people sense the future? A high-profile experiment published in psychology's most prestigious journal ignited a firestorm — and a wave of attempted replications that would reshape the discipline itself.

In 2011, Daryl Bem, a well-respected social psychologist at Cornell, published a paper that many thought should have been impossible. Across nine experiments involving over a thousand undergraduates, Bem took classic, well-established psychological effects and ran them backwards in time. In one experiment, participants were shown two curtains on a computer screen and asked to guess which one hid an image — but the computer didn't actually place the image behind a curtain until after the participant chose. When the hidden image was erotic, people picked the correct curtain about 53% of the time instead of the expected 50%. That three-point edge may sound small, but across hundreds of trials, eight of the nine experiments reached statistical significance, with an average effect size of d = 0.22. The paper appeared not in a fringe outlet but in the Journal of Personality and Social Psychology, one of the field's flagship journals, and it passed the same peer-review gauntlet as any other study. Mainstream psychology could not simply look away. Feeling the Future: Experimental...

When critics demanded replications, supporters delivered. By 2015, Bem and colleagues had assembled a meta-analysis of 90 experiments from 33 laboratories across 14 countries, encompassing over 12,000 participants. The combined results told a striking story: the overall effect was small but stubbornly persistent, and the odds of the pattern arising by chance alone were less than one in a billion (p = 1.2 x 10^-10). Even using Bayesian statistics — a framework many skeptics prefer — the evidence in favor of the effect outweighed the evidence against it by a factor of roughly five billion to one. Perhaps most intriguingly, the analysis revealed that experiments requiring fast, intuitive responses produced a much stronger effect than those allowing slow, deliberate thinking, suggesting that whatever was happening, it operated beneath conscious awareness. Seven out of eight tests for cherry-picking or publication bias came back clean. Feeling the Future: A Meta-Analy...

Meanwhile, an independent team at a German university added an unexpected wrinkle. Markus Maier and colleagues at the University of Munich had begun running their own precognition experiments before they were even aware of Bem's work. Their design was different — participants pressed two keys simultaneously while masked negative images flashed just half a second after their response — yet the pattern looked remarkably similar. Four of their seven studies found significant avoidance of future negative stimuli, and a meta-analysis across all seven experiments, with nearly 3,000 participants, yielded a combined effect that was highly significant (p < .0001). Their Bayesian analysis favored the existence of the effect by a factor of 293 to 1. The fact that an independent European lab, using a different method and a different theoretical framework, converged on similar numbers gave proponents reason to believe the effect was not merely a quirk of Bem's laboratory. Feeling the Future Again: Retroa...

But not everyone is convinced — and the objections cut deep.

Within months of Bem's original paper, a team of statisticians led by Eric-Jan Wagenmakers published a rebuttal in the very same journal. Their argument was not that Bem had committed fraud or made errors in his data; it was that the entire statistical machinery Bem relied on — the same machinery used across all of psychology — was fundamentally misleading. When Wagenmakers reanalyzed Bem's nine experiments using Bayesian methods, the dramatic p-values shrank to whispers. Of ten critical tests, only one yielded what could be called substantial evidence for precognition. Three actually favored the conclusion that there was no effect at all. The remaining six were simply inconclusive — the statistical equivalent of a shrug. The paper argued that Bem's research exposed not just a possible anomaly in nature, but a structural flaw in how psychologists everywhere evaluated evidence: standard p-values, they showed, routinely overstate the strength of findings, especially for extraordinary claims. Why Psychologists Must Change th...

Then came the replications themselves. Jeff Galak and colleagues ran seven carefully controlled experiments with over 3,200 participants, targeting the specific paradigm — retroactive facilitation of recall — that had produced Bem's strongest results. They standardized delivery by computer, locked in sample sizes before collecting data, and never peeked at results mid-stream. Six of their seven experiments found nothing. The combined effect across all their studies was essentially zero (d = 0.01), and their Bayesian analysis provided what statisticians classify as "extreme" evidence in favor of the null hypothesis — a factor of roughly 70 to 1 that there was no effect to find. A broader meta-analysis of all 19 known replication attempts by independent labs told a similar story: the overall effect was d = 0.04, statistically indistinguishable from zero. One uncomfortable finding stood out: the only significant predictor of whether an experiment succeeded was whether Bem himself had run it. Correcting the Past: Failures to...

The numbers on both sides of this debate are real, published in peer-reviewed journals, and produced by qualified researchers. Whether they point toward an undiscovered feature of human cognition or an undiscovered flaw in scientific methodology remains, for now, an open question.

Mind Reaching Outward: Telepathy and Remote Viewing

From ganzfeld dream labs to CIA-funded psychic spying, decades of research claim the mind can acquire information across space -- but the pattern of results tells a complicated story.

In 1994, psychologist Daryl Bem and his colleague Charles Honorton published a paper in Psychological Bulletin -- one of the most prestigious journals in all of psychology -- presenting results from a technique designed to test telepathy under controlled conditions. The setup, called the "ganzfeld" (German for "whole field"), works like this: a "receiver" sits in a reclining chair wearing halved ping-pong balls over their eyes, bathed in red light and white noise, while a "sender" in another room concentrates on a randomly selected image or video clip. The receiver then tries to identify the target from a set of four options. Pure chance gives you a 25% hit rate. Across 329 sessions using automated, computer-controlled procedures -- eliminating the possibility of subtle human cues -- receivers picked the right target 32% of the time. The odds of that gap appearing by luck alone were about 1 in 500. Performing arts students from Juilliard hit an extraordinary 50%. The paper was published alongside the very methodological standards skeptics had demanded, and it put ganzfeld research squarely on mainstream psychology's radar. Honorton never saw the paper in print; he died of a heart attack nine days before it was accepted. He was forty-six. Does Psi Exist? Replicable Evide...

The story grew larger. In 2010, Lance Storm and colleagues published a sweeping meta-analysis -- again in Psychological Bulletin -- that assembled 108 ganzfeld experiments spanning thirty-four years. The combined result was staggering: a Stouffer Z of 8.31, corresponding to odds against chance of less than one in ten thousand trillion. Even limiting the analysis to the 29 most methodologically homogeneous ganzfeld studies from 1997 onward, the hit rate held at 32.2% with odds against chance of roughly one in fifty million. Studies using other "noise reduction" techniques -- dreams, hypnosis, meditation -- also beat chance, though less dramatically. The standard free-response studies without any sensory reduction showed nothing. The pattern seemed clear: quiet the mind's ordinary chatter, and something else comes through. Meta-Analysis of Free-Response S...

Meanwhile, an entirely separate line of research had been running in secret. For over twenty years, the U.S. government funded experiments in "remote viewing" -- the claimed ability to mentally perceive distant locations or events. When Congress finally ordered the CIA to evaluate the program, statistician Jessica Utts was brought in to audit the data. Across 770 sessions at Stanford Research Institute and 445 at Science Applications International Corporation, the effect sizes were remarkably consistent: 0.209 and 0.230, respectively. Expert viewers replicated these results across different institutions, different years, and different experimenters. Utts concluded that by every standard criterion used in mainstream science, the phenomenon was established. She recommended that researchers stop trying to prove it exists and start figuring out how it works. Even Ray Hyman, the skeptical psychologist assigned as her counterpart, agreed the statistics were sound -- he simply wasn't convinced the numbers pointed to anything genuinely psychic. An Assessment of the Evidence fo...

But not everyone is convinced -- and the closer you look at those impressive composite numbers, the more complicated they become.

Ray Hyman, who had been sparring with ganzfeld researchers since the 1980s, published a pointed response to Storm's 2010 meta-analysis in the same journal. His argument was surgical: the celebrated ganzfeld hit rate was not evenly distributed across researchers. Strip out the results from just four experimenters, who collectively produced a 44% hit rate, and everyone else landed at 26% -- statistically indistinguishable from chance. Furthermore, the autoganzfeld's success came almost entirely from dynamic video targets (37% hit rate); static images, the kind used in the original studies, scored roughly 26% -- a failed replication hiding inside a successful meta-analysis. Most damning, a follow-up series called "Autoganzfeld II," which met every quality criterion Storm's team had outlined, produced hit rates of 26.5% and 25.8% across hundreds of sessions. Dead chance. Hyman compared the situation to N-rays, a famous episode in physics where French scientists kept confirming a nonexistent phenomenon because their methods were subtly flawed. A meta-analysis, he warned, can make nothing look like something if you average together the right collection of studies. Meta-Analysis That Conceals More...

In 2013, a team of Bayesian statisticians led by Jeffrey Rouder took yet another approach. Rather than asking "is the result statistically significant?" they asked "how strongly does this evidence support psi over no-psi?" Applied to Storm's full dataset, the answer was overwhelming: roughly six billion to one in favor. But Rouder's team noticed something troubling buried in the details. Studies that used manual randomization -- where a human, rather than a computer, selected the targets -- produced dramatically higher hit rates than computer-randomized studies. The Bayes factor for that difference alone was about 6,350 to one, meaning it was thousands of times more likely that manual randomization introduced some procedural flaw than that both methods were detecting a real phenomenon equally. Once manually randomized studies were removed and previously omitted null conditions were added back in, the evidence dropped from six billion to one down to somewhere between 32 and 328 to one -- still notable, but far less extraordinary, and, Rouder argued, unpersuasive given the absence of any known mechanism and the likelihood of further unreported null results sitting in researchers' file drawers. A Bayes Factor Meta-Analysis of ...

The ganzfeld remains the most debated experimental paradigm in parapsychology -- a body of evidence large enough that neither side can simply dismiss the other, and contested enough that the question of what it actually shows remains genuinely open.

Bodies, Machines, and the Broadest Patterns

Beyond card-guessing and dream labs, a different generation of researchers began asking bolder questions: can the body itself sense the future before it arrives, and can the focused intention of human minds nudge the behavior of physical machines?

In 2012, Julia Mossbridge, Patrizio Tressoldi, and Jessica Utts assembled every controlled experiment they could find on a phenomenon called "presentiment" -- the idea that your body begins reacting to a startling event a few seconds before it actually happens. Across 26 studies from seven independent laboratories, spanning skin conductance, heart rate, pupil dilation, and brain imaging, they found a small but unmistakable signal: a composite effect size of 0.21, with odds against chance of less than one in a trillion. What made the result especially hard to dismiss was a pattern almost never seen in psychology: the more rigorous the study, the larger the effect became. Publication bias models typically predict the opposite -- sloppy studies inflate results, and careful ones shrink them. Here, the trend ran the other way. Predictive Physiological Anticip...

Meanwhile, at Princeton, Roger Nelson had been running what may be the largest pre-registered experiment in parapsychology's history. The Global Consciousness Project placed 65 quantum-based random number generators in laboratories and homes on every inhabited continent, each spitting out hundreds of random bits per second around the clock. Before each major world event -- a terrorist attack, a presidential inauguration, a tsunami -- Nelson's team would formally register a hypothesis and only then look at the data. By January 2011, they had run more than 345 such tests. The cumulative result was a 6.2-sigma deviation from chance, meaning the data strayed from what pure randomness would predict by an amount you would expect to see by accident roughly twice in a billion runs. The signal was not coming from individual devices drifting; it was showing up as subtle correlations between devices scattered across the globe, a pattern that ruled out mundane electromagnetic interference. Effects of Mass Consciousness: C...

And then there is the ganzfeld, parapsychology's longest-running experimental tradition. In 2024, Patrizio Tressoldi and Lance Storm published what amounts to the most bias-resistant summary yet: a Stage 2 Registered Report, meaning the analysis plan was peer-reviewed and locked in before the results were computed. They gathered 78 ganzfeld studies spanning 46 years and 46 different principal investigators. Under both classical and Bayesian statistics, the hit rate landed around 31 percent where chance predicts 25 -- a small edge, but one that has refused to budge for decades. A cumulative plot showed the effect stabilizing after 1997 and holding steady ever since, with no sign of the slow fade that usually signals a fading artifact. The Bayes factor was 89.5, meaning the data were about ninety times more likely if the effect is real than if it is not. Stage 2 Registered Report: Anoma...

But not everyone is convinced -- and some argue the entire enterprise is doomed from the start.

In 2019, psychologist Arthur Reber and veteran critic James Alcock published a sweeping challenge in American Psychologist, the flagship journal of the American Psychological Association. Their argument was not about any single experiment. It was about physics. Psi, they wrote, violates four bedrock principles: causality, because no known mechanism can carry information backward in time; thermodynamics, because psychokinesis would create energy in a closed system; the inverse square law, because psi effects do not weaken with distance; and time-asymmetry, because precognition requires the arrow of time to reverse in a way no physical theory supports. Quantum mechanics and relativity, they argued, have been misappropriated as scaffolding for psi -- neither actually permits the phenomena claimed. From this vantage point, no amount of statistical significance can make the impossible merely unlikely; it remains impossible. Searching for the Impossible: Pa...

On the empirical side, a 2006 meta-analysis by Holger Bosch, Fiona Steinkamp, and Emil Boller took dead aim at the mind-over-machine evidence. They examined 380 studies spanning nearly fifty years of people trying to mentally influence random number generators. The overall effect was statistically significant -- but vanishingly tiny, and it shrank as studies got larger, the classic fingerprint of publication bias. A Monte Carlo simulation showed that a straightforward model, one in which about 1,500 null studies simply never made it into print, could reproduce every major feature of the database: the small overall effect, the heterogeneity between studies, and the suspicious relationship between sample size and result. The verdict, borrowing a phrase from 1962: "not proven." Examining Psychokinesis: The Int...

From bodies that seem to flinch before the shock arrives, to random machines that drift in lockstep during moments of global attention, to four decades of people in sensory deprivation picking the right image more often than chance allows -- and then to the physicists and statisticians who say none of it can be what it looks like -- you have now seen the full landscape. The data are real; the disagreement is over what they mean. Whether the universe contains something our current physics has yet to name, or whether these patterns are the cumulative residue of subtle biases we have yet to fully map, remains one of science's most stubbornly open questions.