Nate Silver's 'Signal and the Noise' Examines Predictions - NYTimes ...

www.nytimes.com/.../nate-silvers-signal-and-the-noise-examin...
23 Oct 2012 – The Signal and the Noise,” by the statistician and blogger Nate Silver, examines the complex and often rudimentary science of prediction.

Mining Truth From Data Babel

Nate Silver’s ‘Signal and the Noise’ Examines Predictions

By LEONARD MLODINOW

Published: October 23, 2012

A friend who was a pioneer in the computer games business used to marvel at how her company handled its projections of costs and revenue. “We performed exhaustive calculations, analyses and revisions,” she would tell me. “And we somehow always ended with numbers that justified our hiring the people and producing the games we had wanted to all along.” Those forecasts rarely proved accurate, but as long as the games were reasonably profitable, she said, you’d keep your job and get to create more unfounded projections for the next endeavor.

Enlarge This Image

Alessandra Montalto/The New York Times

THE SIGNAL AND THE NOISE

Why So Many Predictions Fail — but Some Don’t

By Nate Silver

Illustrated. 534 pages. The Penguin Press. $27.95.

Enlarge This Image

Robert Gauldin

Nate Silver

This doesn’t seem like any way to run a business — or a country. Yet, as Nate Silver, a blogger for The New York Times, points out in his book, “The Signal and the Noise,” studies show that from the stock pickers on Wall Street to the political pundits on our news channels, predictions offered with great certainty and voluminous justification prove, when evaluated later, to have had no predictive power at all. They are the equivalent of monkeys tossing darts.

As one who has both taught and written about such phenomena, I have long felt like leaning out my window to shout, “Network”-style, “I’m as mad as hell and I’m not going to take this anymore!” Judging by Mr. Silver’s lively prose — from energetic to outraged — I think he feels the same way.

The book’s title comes from electrical engineering, where a signal is something that conveys information, while noise is an unwanted, unmeaningful or random addition to the signal. Problems arise when the noise is as strong as, or stronger than, the signal. How do you recognize which is which?

Today the data we have available to make predictions has grown almost unimaginably large: it represents 2.5 quintillion bytes of data each day, Mr. Silver tells us, enough zeros and ones to fill a billion books of 10 million pages each. Our ability to tease the signal from the noise has not grown nearly as fast. As a result, we have plenty of data but lack the ability to extract truth from it and to build models that accurately predict the future that data portends.

Mr. Silver, just 34, is an expert at finding signal in noise. He is modest about his accomplishments, but he achieved a high profile when he created a brilliant and innovative computer program for forecasting the performance of baseball players, and later a system for predicting the outcome of political races. His political work had such success in the 2008 presidential election that it brought him extensive media coverage as well as a home at The Times for his blog, FiveThiryEight.com, though some conservatives have been critical of his methods during this election cycle.

His knack wasn’t lost on book publishers, who, as he puts it, approached him “to capitalize on the success of books such as ‘Moneyball’ and ‘Freakonomics.’ ” Publishers are notorious for pronouncing that Book A will sell just a thousand copies, while Book B will sell a million, and then proving to have gotten everything right except for which was A and which was B. In this case, to judge by early sales, they forecast Mr. Silver’s potential correctly, and to judge by the friendly tone of the book, it couldn’t have happened to a nicer guy.

Healthily peppered throughout the book are answers to its subtitle, “Why So Many Predictions Fail — but Some Don’t”: we are fooled into thinking that random patterns are meaningful; we build models that are far more sensitive to our initial assumptions than we realize; we make approximations that are cruder than we realize; we focus on what is easiest to measure rather than on what is important; we are overconfident; we build models that rely too heavily on statistics, without enough theoretical understanding; and we unconsciously let biases based on expectation or self-interest affect our analysis.

Regarding why models do succeed, Mr. Silver provides just bits of advice (other than to avoid the failings listed above). Mostly he stresses an approach to statistics named after the British mathematician Thomas Bayes, who created a theory of how to adjust a subjective degree of belief rationally when new evidence presents itself.

Suppose that after reading a review, you initially believe that there is a 75 percent chance that you will like a certain book. Then, in a bookstore, you read the book’s first 10 pages. What, then, are the chances that you will like the book, given the additional information that you liked (or did not like) what you read? Bayes’s theory tells you how to update your initial guess in light of that new data. This may sound like an exercise that only a character in “The Big Bang Theory” would engage in, but neuroscientists have found that, on an unconscious level, our brains do naturally use Bayesian prediction.

Mr. Silver illustrates his dos and don’ts through a series of interesting essays that examine how predictions are made in fields including chess, baseball, weather forecasting, earthquake analysis and politics. A chapter on poker reveals a strange world in which a small number of inept but big-spending “fish” feed a much larger community of highly skilled sharks competing to make their living off the fish; a chapter on global warming is one of the most objective and honest analyses I’ve seen. (Mr. Silver concludes that the greenhouse effect almost certainly exists and will be exacerbated by man-made CO2 emissions.)

So with all this going for the book, as my mother would say, what’s not to like?

The main problem emerges immediately, in the introduction, where I found my innately Bayesian brain wondering: Where is this going? The same question came to mind in later essays: I wondered how what I was reading related to the larger thesis. At times Mr. Silver reports in depth on a topic of lesser importance, or he skates over an important topic only to return to it in a later chapter, where it is again discussed only briefly.

As a result, I found myself losing the signal for the noise. Fortunately, you will not be tested on whether you have properly grasped the signal, and even the noise makes for a good read.

Leonard Mlodinow is the author of “Subliminal: How Your Unconscious Mind Rules Your Behavior” and “The Drunkard’s Walk: How Randomness Rules Our Lives.”

Nate Silver's book The Signal and the Noise, reviewed. - Slate ...

www.slate.com/.../nate_silver_s_book_the_si... - Cached

by Matthew Yglesias - in 5,794 Google+ circles

5 Oct 2012 – Predictions are hard—especially about the future. It must have taken superhuman will for New York Times FiveThirtyEight blogger and ...

Book Review: The Signal and the Noise - WSJ.com

online.wsj.com/.../SB10000872396390444554704577... - Cached
24 Sep 2012 – Burton Malkiel reviews The Signal and the Noise: Why So Many Predictions Fail—But Some Don't by Nate Silver.
- September 24, 2012, 4:38 p.m. ET
Telling Lies From Statistics

Forecasters must avoid overconfidence—and recognize the degree of uncertainty that attends even the most careful predictions.

By BURTON G. MALKIEL
It is almost a parlor game, especially as elections approach—not only the little matter of who will win but also: by how much? For Nate Silver, however, prediction is more than a game. It is a science, or something like a science anyway. Mr. Silver is a well-known forecaster and the founder of the New York Times political blog FiveThirtyEight.com, which accurately predicted the outcome of the last presidential election. Before he was a Times blogger, he was known as a careful analyst of (often widely unreliable) public-opinion polls and, not least, as the man who hit upon an innovative system for forecasting the performance of Major League Baseball players. In "The Signal and the Noise," he takes the reader on a whirlwind tour of the success and failure of predictions in a wide variety of fields and offers advice about how we might all improve our forecasting skill.
Mr. Silver reminds us that we live in an era of "Big Data," with "2.5 quintillion bytes" generated each day. But he strongly disagrees with the view that the sheer volume of data will make predicting easier. "Numbers don't speak for themselves," he notes. In fact, we imbue numbers with meaning, depending on our approach. We often find patterns that are simply random noise, and many of our predictions fail: "Unless we become aware of the biases we introduce, the returns to additional information may be minimal—or diminishing." The trick is to extract the correct signal from the noisy data. "The signal is the truth," Mr. Silver writes. "The noise is the distraction."
The first half of Mr. Silver's analysis looks closely at the success and failure of predictions in clusters of fields ranging from baseball to politics, poker to chess, epidemiology to stock markets, and hurricanes to earthquakes. We do well, for example, with weather forecasts and political predictions but very badly with earthquakes. Part of the problem is that earthquakes, unlike hurricanes, often occur without warning. Half of major earthquakes are preceded by no discernible foreshocks, and periods of increased seismic activity often never result in a major tremor—a classic example of "noise." Mr. Silver observes that we can make helpful forecasts of future performance of baseball's position players—relying principally on "on-base percentage" and "wins above replacement player"—but we completely missed the 2008 financial crisis. And we have made egregious errors in predicting the spread of infectious diseases such as the flu.
In the second half of his analysis, Mr. Silver suggests a number of methods by which we can improve our ability. The key, for him, is less a particular mathematical model than a temperament or "framing" idea. First, he says, it is important to avoid overconfidence, to recognize the degree of uncertainty that attends even the most careful forecasts. The best forecasts don't contain specific numerical expectations but define the future in terms of ranges (the hurricane should pass somewhere between Tampa and 350 miles west) and probabilities (there is a 70% chance of rain this evening).

Enlarge Image

The Signal and the Noise
By Nate Silver
(The Penguin Press, 534 pages, $27.95)

Above all, Mr. Silver urges forecasters to become Bayesians. The English mathematician Thomas Bayes used a mathematical rule to adjust a base probability number in light of new evidence. To take a canonical medical example, 1% of 40-year-old women have breast cancer: Bayes's rule tells us how to factor in new information, such as a breast-cancer screening test. Studies of such tests reveal that 80% of women with breast cancer will get positive mammograms, and 9.6% of women without breast cancer will also get positive mammograms (so-called false positives). What is the probability that a woman who gets a positive mammogram will in fact have breast cancer? Most people, including many doctors, greatly overestimate the probability that the test will give an accurate diagnosis. The right answer is less than 8%. The result seems counterintuitive unless you realize that a large number of (40-year-old) women without breast cancer will get a positive reading. Ignoring the false positives that always exist with any noisy data set will lead to an inaccurate estimate of the true probability.
This example and many others are neatly presented in "The Signal and the Noise." Mr. Silver's breezy style makes even the most difficult statistical material accessible. What is more, his arguments and examples are painstakingly researched—the book has 56 pages of densely printed footnotes. That is not to say that one must always agree with Mr. Silver's conclusions, however.
As someone interested in financial markets, I found myself unconvinced by Mr. Silver's view that it should not be "all that challenging" to identify financial bubbles "before they burst." He suggests that the dot-com bubble that deflated in early 2000 was identifiable in advance. The price-earnings multiple for the market was enormously elevated at 44. Considerable empirical work, shown in the book, was adduced to point out that long-run (10- or 20-year) rates of return from stocks have generally been poor or negative when investors entered the market at such lofty valuation metrics.
The problem is that Mr. Silver has ignored all the false positives. Earnings multiples were elevated in the early 1990s, suggesting poor stock returns. But the 1990s produced extraordinarily generous equity returns. Earnings multiples were even higher in December 1996, suggesting negative long-run rates of return. This analysis influenced Alan Greenspan's famous "irrational exuberance" speech that month. The stock market rallied sharply until March 2000. Yes, the valuation model gave an accurate bubble prediction in March 2000 but a devastatingly inaccurate one throughout much of the 1990s. Stock prices were wildly inflated in early 2000. But the efficient-market hypothesis doesn't imply that prices are always correct, as Mr. Silver asserts. Prices are always wrong. What the hypothesis asserts is that one never knows for sure if they are too high or too low.
Mr. Malkiel is the author of "A Random Walk Down Wall Street."

The Signal and the Noise - National Public Radio

www.npr.org/.../the-signal-and-the-noise-why-most-prediction...

A description for this result is not available because of this site's robots.txt – learn more.

'Signal' And 'Noise': Prediction As Art And Science

Listen to the Story

Fresh Air from WHYY

[38 min 28 sec]

The Signal and the Noise

Why Most Predictions Fail—but Some Don't

by Nate Silver

Hardcover, 534 pages | purchase

Interview Highlights

On his forecasting of the 2008 presidential election
"I think the best thing that our model did in 2008 was that it detected very quickly after the financial crisis became manifest — meaning after Lehman Brothers went belly up — that McCain's goose was cooked — that he'd been a little bit behind before, and there was such a clear trend against him that McCain had very little chance in the race from that point onward. Interestingly enough, Obama had about the same lead pre-Lehman Brothers over McCain that he did before the debate against Romney, so you see in 2008 you had a narrow Obama advantage that broke and opened up toward him, whereas this cycle, you had a narrow advantage that collapsed to close to a tie, based on a news event going the other way."
On the bias of statistical models
"You can build a statistical model and that's all well and good, but if you're dealing with a new type of financial instrument, for example, or a new type of situation — then the choices you're making are pretty arbitrary in a lot of respects. You have a lot of choices when you're designing the model about what assumptions to make. For example, the rating agencies assume basically that housing prices would behave as they had over the previous two decades, during which time there had always been steady or rising housing prices. They could have looked, for example, at what happened during the Japanese real estate bubble, where you had a big crash and having diversified apartments all over Tokyo would not have helped you with that when everything was sinking — so they made some very optimistic assumptions that, not coincidentally, happened to help them give these securities better ratings and make more money."
On predictions of political pundits who appear on the TV program The McLaughlin Group
"These predictions were made over a four-year interval, so it's a big enough chunk of data to make some fair conclusions. We found that almost exactly half of the predictions were right, and almost exactly half were wrong, meaning if you'd just flipped a coin instead of listening to these guys and girls, you would have done just as well. And it wasn't really even the case that the easier predictions turned out to be right more. So, for example, on the eve of the 2008 election, if you go to Vegas you would have seen Obama with a 95 percent of winning. Our forecast model had him with a 98 percent chance. Three of the four panelists said it was too close to call, despite Obama being ahead in every single poll for months and months and the economy having collapsed. One of them, actually, Monica Crowley on Fox News, said she thought McCain would win by half a point. Of course, what happened the next week where she came back on the air and said, 'Oh, Obama's win had been inevitable, how could he lose with the economy' ... so there's not really a lot of accountability."
On the similarities between the invention of the printing press and the current digital age
"Basically, books were a luxury item before the printing press. ... They cost in the equivalent in today's dollars of about $25,000 to produce a manuscript. So unless you were a king or a bishop or something, you probably had never really read a book. And then, all of a sudden, the printing press reduced the cost of publishing a book by about 500 times, so everyone who was literate at least could read. But what happened is that people used those books as a way to proselytize and to spread heretical ideas, some of which are popular now but at the time caused a lot of conflict. The Protestant Reformation had a lot to do with the printing press, where Martin Luther's theses were reproduced about 250,000 times, and so you had widespread dissemination of ideas that hadn't circulated in the mainstream before. And, look, when something is on the page or the Internet, people tend to believe it a lot more, and so you had disagreements that wound up playing into greater sectarianism and even warfare."

Read an excerpt of The Signal and the Noise

新經濟學與台灣戴明圈: The New Economics and A Taiwanese Deming Circle

2012年11月3日星期六