Anscombe’s quartet and the importance of graphing your data
December 30, 2010
I’ve stalled a little bit in my thesis writing – I’ve only got reanalysis and in-depth rewriting to be done on this chapter, which I’m not keen to do at this time of night. So I thought I’d bash out a quick post before the new year.
When dealing with sets of data from familiar experiments, it might be tempting to throw the numbers into your favourite statistics software package, and report the coefficient and p value. But researcher beware! Strange things may be hiding in your data… Anscombe’s quartet is a fine example of this. The quartet is four sets of data that have the same sample statistics (mean, variance, correlation coefficient and regression equation), but when graphed, they are clearly very different.
Anscombe's quartet plotted, from Wikipedia
The quartet is only an illustrative example of what is possible; the Wikipedia article has links to other similar data.
But graphing your data doesn’t just guard against mistakes, it can also allow you to see patterns in your data that you hadn’t thought to look for. If you use R, there are plenty of snippets of code that make a summary plot of data, with frequency distributions and Q-Q plots. So give it a go. Work with your data from the bottom up – you never know what you might find.
P.S. Happy New Year everyone – There’s a surprise coming for Neuromancy next year…