More Margin of Error
Yesterday, I pointed out that the margin of error for the difference between, say, Bush and Kerry in a single poll is double the margin of error reported in the newspaper, while the margin of error for the gain from poll to poll is 1.4 (the square root of 2) times the "newspaper margin of error." So if the newspaper reports that Kerry is ahead 53 to 47, with margin of error +/ 3, that really means Kerry has a lead of 6, +/ 6. If Bush increases his percentage to 53 in the next poll, that's a gain of 6 +/ 4.2.
Confirmation that this is not a trivial point comes from
the respected polling firm ARG, whose Ballot Lead Calculator produces different results, and who replied to my criticism with an email defending their calculations. Who should you believe? If you have PhD in statistics, you can read both my argument and ARG's, and you'll see that I'm right.
If you don't have a PhD, I also provided an argument from authority: a link to the
Online Statistics Course at Rice University. The professors at Rice provide a
clever argument showing that the margin of error for the lead is double the "newspaper margin of error," which is the point I assert and ARG denies. The Rice U. argument seems too simple to be correct, but that's what makes it clever.
As far as I'm concerned, my previous post was enough to show that I'm right, and I found nothing in ARG's reply to be at all persuasive. I don't mean to bash ARG: I never see polling firms report anything but the most basic confidence interval, so I give ARG credit for trying to do better, even if they've messed up a tricky point.
In any event, here's a new argument from authority, and a new logical argument.
I tried making up some data, simulating 600 respondents, 53% of whom support Kerry and 47% Bush. Then I asked the widely used
Stata statistical package to report the confidence interval for Kerry's lead. The manufactured data turned out to have Kerry with 52 percent and Bush with 48 percent. Stata reports that the "newspaper confidence interval" for Kerry runs from 48 to 56, or 52 +/ 4. The confidence interval for Kerry's lead runs from 4 to 12, or 4 +/ 8, according to Stata, double the margin of error for Kerry's percentage alone. If you type these same numbers into ARG's Ballot Lead Calculator, you get the wrong answer: a confidence interval for Kerry's lead from 1.7 to 9.7, or 4 +/ 5.7.
Here's the Stata log:
. set obs 600
obs was 0, now 600
. gen Bush=uniform()<.47
. gen Kerry = 1Bush
. ttest Kerry==Bush
Paired t test

Variable  Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
+
Kerry  600 .52 .0204131 .5000167 .4799101 .5600899
Bush  600 .48 .0204131 .5000167 .4399101 .5200899
+
diff  600 .04 .0408262 1.000033 .0401799 .1201799

Here's a mathematical argument:
Let K be the percentage of the poll sample planning to vote for Kerry. Then the percentage planning to vote for Bush is 1K (there are no undecideds or other candidates). So, Kerry's lead is K  (1K) = 2K1. Let s stand for the standard error of K, so the "newspaper margin of error" about K is +/ 1.96*s. The margin of error for Kerry's lead is +/ 1.96*SE(2K1) = 1.96*2s. QED.
UPDATE: Former economistforDean Kautilyan weighs in with some harsh
criticism of ARG (though, like me, he thinks ARG is typical of most pollsters, not necessarily any worse).
