Tuesday 15 June 2010

This is what the internet is for

No, not porn, racism and football.

There's a story in the Guardian today about vascular surgery death rates in different hospitals - a really serious piece of investigative work, given that hospitals don't collect statistics reliably, and certainly don't like releasing them.

The thrust seems to be that bigger hospitals do more complicated surgery, so their surgeons are more experienced and so survival rates are higher. Though there are complications - small hospitals might have high survival rates by passing difficult procedures to bigger hospitals.

Ben Goldacre (of Bad Science) took the figures the Guardian made available and queried the way the statistics have been processed. Then lots and lots and lots of statisticians have reprocessed the figures in different ways. The journalist who wrote the piece is engaging in the discussion and putting Goldacre in touch with the medics who collected the data.

This is the kind of thing which occurs.


Simple linear regression is not appropriate in this case as pointed out above. You are not accounting for the fact that there is lots more random error associated with units that do fewer operations. A glm will take into account the fact that there is more precision in the rates estimated for larger units. I have performed a glm in Stata with a rather simple categorization of the number of operations into 0-50, 51-100, 101-200 and 201+.
Categorizing is generally ont a good thing to do, but as there appears to be some confusion and thus help to illustrate what is going on.
. recode ops (min/50 = 1) (51/100 = 2) (101/200 = 3) (201/max = 4), gen(ops_group)
(99 differences between ops and ops_group)

. glm dead i.ops_group, family(binomial ops) link(log) eform nolog
Generalized linear models No. of obs = 99
Optimization : ML Residual df = 95
Scale parameter = 1
Deviance = 186.5015817 (1/df) Deviance = 1.963175
Pearson = 181.6864657 (1/df) Pearson = 1.912489

Variance function: V(u) = u*(1-u/ops) [Binomial]
Link function : g(u) = ln(u/ops) [Log]

AIC = 4.721745
Log likelihood = -229.7263858 BIC = -250.0348

------------------------------------------------------------------------------
| OIM
dead | Risk Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ops_group |
2 | .8576235 .1563294 -0.84 0.399 .5999815 1.225901
3 | .6705549 .1181781 -2.27 0.023 .4747 .9472169
4 | .5466476 .1057056 -3.12 0.002 .3742054 .7985549
------------------------------------------------------------------------------

The mortality rate is 33% lower in units with 101-200 operstions that those with 0-50 and is 45% lower in those with 201+ operations compared to those with 0-50.
The result is highly significant.


I don't understand the stats (not much call for it in Welsh literature studies) but the gloss is useful and I'm learning all the time.

Offline, or in old print media, this wouldn't happen. No paper would be open to critique like this, and you wouldn't be able to marshal the resources of all these slackers who should be working on their day jobs. Between all these people, a fascinating (nerdy) and better story is emerging. Open data is really important, as is the wisdom of informed crowds (including the critics who've engaged with Goldacre's approach).

Good for NHS users, good for the Guardian and good for the web!

No comments: