Thursday, June 23, 2005

Lancet Quiz: The Answer

Someone has finally attempted my quiz! Commenter Kevin, writes:
My guess is that Apfelroth's reasoning is along these lines: in a war, refugees move away from potential flash-points, so the population is not geographically disributed as it would be for a census. So if your sampling method derives from census data, you will be over-sampling violent areas.
No, that's not it. Kevin's critique has previously been raised (very mildly) by Daniel Davies, one of the Lancet study's defenders. But that isn't what Apfelroth is saying, and in my opinion the reliance on old Iraqi census data is probably only a small problem with the Lancet study.

The problem is this. Within cities and villages, the Lancet researchers drew a rectangle around the village on a map, divided the rectangle up into a grid of hundred meter squares, and chose one of the squares at random. Each square had an equal chance of being chosen even though they might have had vastly unequal populations.

Apfelroth, who seems to be knowledgable about surveys, knows that if you draw a random sample of neighborhoods with unequal populations, you have to re-weight the sample to take that into account. Apfelroth is saying that given the available data, this would have been very hard: "it seems quite likely that the grid rectangles created by driving around in a war zone were much smaller than the original census tracts used in the 'cumulative population lists'."

In fact, Apfelroth is being overly generous here, assuming that the Lancet researchers did their best with this crucial step (weighting for neighborhood population). Either he is giving them the benefit of the doubt, or he's failed to notice that the Lancet researchers did not take this vital step at all: they weight all neighborhoods (100m grid squares) equally. Because they weight neighborhoods with different populations equally, their sample is much more likely to choose people from low-population neighborhoods: neighborhoods near parks or rivers, neighborhoods on the fringe of the city.

Anyone who's ever seen a Red/Blue map of Presidential voting in US counties will understand this phenomenon. Bush won the vast majority of the counties in the US (something like 80%) covering the vast majority of the country's land area (maybe 95%, I'd guess). Suppose you took a poll using the Lancet grid method. First you draw a rectangle around the continental US, then you choose grid squares within the rectangle at random, and finally you survey 30 people in each selected grid square. You'd end up with a lot of grid squares with no one living in them at all (e.g., oceans and deserts). And the vast majority of people interviewed would be in the rural, "Red" areas. A survey like this would likely find that 95% of respondents voted for Bush even though only 51% of the country actually did.

Now the Lancet study isn't quite this bad. It was only after they'd chosen a set of cities, towns, and villages to survey, using a reasonable method that gave a larger chance of selection to larger towns, that they began drawing rectangles and grids, and sampling land-area rather than people.

This is a huge problem for the Lancet survey if fighting is heavier on the fringes of cities than in the center, as I think it is. It will cause an overestimate of the death toll (I think a large overestimate). My commenter, Kevin, thinks that fighting is more common in city centers. If he's right, the Lancet study is still biased, although towards finding too small a death toll. It's hard to be sure who's right about where most fighting has occurred, but I'll make my case in a later post.

