Content uploaded by Emile Jacques Servan-Schreiber
Author content
All content in this area was uploaded by Emile Jacques Servan-Schreiber on Aug 08, 2017
Content may be subject to copyright.
Collective Intelligence 2017
Debunking Three Myths About Crowd-Based
Forecasting
EMILE SERVAN-SCHREIBER, Hypermind International
Ever since the publication of « The Wisdom of Crowds » (Surowiecki, 2004), the accuracy of crowd-
based forecasting has served as a prime example of the practical value of collective intelligence.
Prediction markets were long believed to be the gold standard for eliciting the highest-quality forecast
from a crowd (Arrow et al, 2008). However, over the last decade, this hypothesis has come under
increasingly heavy fire from two new approaches claiming to push the boundaries of “the art and
science of prediction”. First arose the big-data statistical models, popularized by Nate Silver’s
FiveThirtyEight and the New York Time’s Upshot. Then came a new generation of “prediction polls”
(Atanasov et al, 2016) and “superforecasters” (Tetlock & Gardner, 2015), fresh from winning a multi-
year geopolitical crowd-forecasting tournament sponsored by the U.S. government’s Intelligence
Advanced Research Projects Activity (IARPA). Now confusion reigns supreme. Who really is better at
forecasting, crowds or models? And which crowd-based methods are the most reliable, polls or
markets? And which types of markets, based on real money or play money?
Using comparative data from a variety of leading prediction markets, prediction polls, and statistical
models, this study assesses their relative performance in forecasting recent major political events. The
results help debunk three popular myths about crowd-based forecasting.
COMPARISON POINTS
The forecasters included in this study are the most established and reputable of each type:
Hypermind
Almanis
Pivit
Iowa
Electronic
Markets
PredictIt
Betfair
Good
Judgment
Five
Thirty
Eight
New
York
Times
Input
Crowd
Crowd
Crowd
Crowd
Crowd
Crowd
Crowd
Data
Data
Method
Market
Market
Market
Market
Market
Market
Poll
Stats
Stats
Betting
Play
Play
Play
Real
Real
Real
Play*
-
-
Main
Crowd
France, US
UK, US
US
US
US
UK
UK, US
-
-
* Although there is no betting in the Good Judgment “prediction poll”, participants do compete to make the most accurate
probability assessments. So in terms of competition and risk, the situation is similar to a play-money market.
Besides sports, only a few events are covered by enough forecasters to offer useful comparison points.
Major electoral events in the U.S. and U.K. offer common ground. Luckily, those few events are also
among the most consequential and media-covered political events of the past few years: the midterm
election of 2014 and presidential election of 2016 in the U.S., and the “Brexit” referendum of 2016 in
the U.K.. So what the data lack in volume, they make up in relevance. This analysis focuses
specifically on the outcomes that were hardest to forecast (most suspenseful), over the longest possible
time frame common to enough forecasters. In chronological order, they are:
2 E. Servan-Schreiber
Collective Intelligence 2017
A. Senate Control 2014: Will the Republican party control the U.S. Senate after the 2014
midterm elections? Outcomes: Yes or No. Time frame: September 3 to November 3, 2014.
B. GOP Nomination 2016: Who will be the Republican party’s presidential nominee in 2016?
Outcomes: Cruz, Rubio, Trump, or Other. Time frame: January 25 to May 3, 2016.
C. Brexit 2016: Will the U.K. vote to leave the European Union (i.e., Brexit) in 2016? Outcomes:
Yes or No. Time frame: April 1 to June 22, 2016.
D. USA President 2016: Will Donald Trump be elected President of the United States in 2016?
Outcomes: Yes or No. Time frame: July 1 to November 7, 2016.
All forecasters produced probabilities for each possible outcome, updated at least daily. Forecasts were
scored using the Brier scoring rule – a strictly proper scoring rule, based on squared errors. Smaller
Brier scores denote higher accuracy. A perfect prediction – probability 1 for outcomes that occur,
probability 0 for outcomes that don’t – yields a score of 0 while a perfectly wrong prediction earns a
score of 2. To compute a forecaster’s performance on a question, its daily probabilities were scored
throughout the question’s time frame, then averaged into a mean daily Brier score. The results are
plotted in Figure 1.
Fig. 1. Mean daily Brier scores of the forecasters on each question. Smaller bier scores indicate better forecasts, and the
forecasters are ranked from best (top) to worst (bottom) for each question. The forecasters are also color-coded into three
classes: play-money crowd-based forecasts, real-money crowd-based forecasts, and data-driven models. See text for an analysis.
Debunking Three Myths About Crowd-based Forecasting 3
Collective Intelligence 2017
MYTH #1: DATA-DRIVEN MODELS ARE MORE RELIABLE THAN CROWD-BASED FORECASTS
The appeal of data-driven models is that they can perhaps claim to be more evidence-based than
crowd judgments. It is also easier to explain a model’s forecast than a crowd’s. However, their
forecasts are also more brittle in that they are more easily misled by bad data (e.g., bad polling
numbers) or lack of relevant data (e.g., no Trump-like precedent in U.S. politics). Human collective
intelligence can be more robust when the going gets tough. In this study, the two highest-profile, best
funded forecasting models failed to outperform the collective intelligence. On the USA Senate Control
2014 question, both models performed worse than the three prediction markets. On the USA
President 2016 question, one model bested almost all crowd-based forecasters, but the other
performed the worst. In both cases, the play-money market Hypermind performed better than both
models. This result confirms and extends those of Servan-Schreiber & Atanasov (2015) and Atanasov
& Joseph (2016) using somewhat different data sets from different forecasters and time frames.
MYTH #2: SUPERFORECASTING PREDICTION POLLS OUTPERFORM PREDICTION MARKETS
Fresh off its recent victory in the IARPA-sponsored ACE geopolitical forecasting tournament, Good
Judgment proposes two relative novelties in crowd-based forecasting. First, so-called “prediction
polls”, where participants compete to assess event probabilities and achieve the lowest Brier scores.
These inputs are then filtered, weighed, and transformed statistically to extract the most informed
compound judgment. During the IARPA tournament, it was found that teams of forecasters thus
surveyed could slightly outperform a generic play-money prediction market (Atanasov et al., 2016).
The second novelty is so-called “superforecasters”, a special breed of excellent forecasters – the top 2%
of all participants – whose common psychological traits and habits enable them to maintain a high
level of performance over the long term. During the IARPA tournament, teams of superforecasters
were able to beat a generic play-money prediction market by 15% to 30% (Tetlock & Gardner, 2015).
In the present study, however, the Good Judgment polls did not outperform the markets. In direct
comparisons on three of the most consequential political events in recent history – Brexit and Trump’s
nomination, then victory in the U.S. presidential election – Good Judgment’s prediction polls tied or
underperformed the play-money market Hypermind. On the Brexit question a poll of superforecasters
scored worse than all the markets. Furthermore, the Good Judgment superforecasting teams failed to
outperform Hypermind on 35 questions from the IARPA tournament itself that Hypermind was
allowed to list on its market (achieving .258 and .264 average Brier scores, respectively, according to
government data – a non-significant difference). It seems that beating a generic prediction market in
a controlled experimental setting is easier than beating a full-featured market in the real world.
MYTH #3: REAL-MONEY PREDICTION MARKETS OUTPERFORM PLAY-MONEY ONES
This is perhaps the most persistent myth about prediction markets. It continues to thrive because
pundits, economists, and the general public find it so intuitively obvious that “putting your money
where your mouth is” is what drives a market’s forecasting accuracy. But the data disagrees. In the
sports domain, this myth has been debunked long ago (Servan-Schreiber et al., 2004). Still, doubts
persisted regarding financial and political predictions (Rosenbloom et al., 2006; Diemer, 2010). The
present study finds no real-money advantage whatsoever on the highest-stakes political questions. In
fact, in three of the four questions – the hardest ones, according to overall Brier scores – the play-
money market Hypermind tied or outperformed all the real-money markets. Furthermore, the deepest
and largest real-money market, Betfair (based in the U.K. where betting is legal and popular),
generally underperformed its two U.S.-based counterparts despite the severe regulatory constraints
imposed on them by the government regarding how much each trader can invest (only a few hundred
dollars). Clearly, liquidity and treasure are not the main drivers of market accuracy.
4 E. Servan-Schreiber
Collective Intelligence 2017
REFERENCES
Arrow, K., Forsythe, R., Gorham, M., Hahn, R., Hanson, R., Ledyard, J., Levmore, S., Litan, R.,
Milgrom, P., Nelson, F., Neumann, G., Ottaviani, M., Schelling, T., Shiller, R., Smith, V.,
Snowberg, E., Sunstein, C., Tetlock, P.C., Tetlock, P.E., Varian, H., Wolfers, J. and Zitzewitz, E.
(2008). The promise of prediction markets. Science, 320:877-878.
Atanasov P., and Joseph R. (2016). Which Election Forecast Was the Most Accurate? Or Rather: The
Least Wrong? The Washington Post, November 30, 2016.
Atanasov, P., Rescober, P., Stone, E., Swift, S., Servan-Schreiber. E., Tetlock, P., Ungar, L., and Mellers,
B. (2016). Distilling the Wisdom of Crowds: Prediction Markets vs. Prediction Polls. Management
Science, 2016.
Diemer, S. (2010). Real-Money Vs. Play-Money Forecasting Accuracy in Online Prediction Markets -
Empirical Insights from Ipredict. Journal of Prediction Markets, 4(3), December 2010.
Rosenbloom, E., & Notz, W. (2006). Statistical Tests of Real-Money Versus Play-Money Prediction
Markets. Electronic Markets, 16 (1) pp. 63-69.
Servan-Schreiber, E., and Atanasov, P. (2015) Hypermind vs Big Data: Collective Intelligence Still
Dominates Electoral Forecasting. In Proceedings of the 2015 Collective Intelligence Conference,
Santa Clara.
Servan-Schreiber, E., Wolfers, J., Pennock, D., & Galebach, B. (2004). Prediction Markets: Does
Money Matter? Electronic Markets, 14 (3) pp. 243-251.
Surowiecki, J. (2004). The Wisdom of Crowds. Doubleday.
Tetlock, P., and Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown.