Anscombe’s quartet and the importance of graphing your data

I’ve stalled a little bit in my thesis writing – I’ve only got reanalysis and in-depth rewriting to be done on this chapter, which I’m not keen to do at this time of night. So I thought I’d bash out a quick post before the new year.

When dealing with sets of data from familiar experiments, it might be tempting to throw the numbers into your favourite statistics software package, and report the coefficient and p value. But researcher beware! Strange things may be hiding in your data… Anscombe’s quartet is a fine example of this. The quartet is four sets of data that have the same sample statistics (mean, variance, correlation coefficient and regression equation), but when graphed, they are clearly very different.

The distributions in four very different datasets produce the same statistics

Anscombe's quartet plotted, from Wikipedia

The quartet is only an illustrative example of what is possible; the Wikipedia article has links to other similar data.

But graphing your data doesn’t just guard against mistakes, it can also allow you to see patterns in your data that you hadn’t thought to look for. If you use R, there are plenty of snippets of code that make a summary plot of data, with frequency distributions and Q-Q plots. So give it a go. Work with your data from the bottom up – you never know what you might find.

P.S. Happy New Year everyone – There’s a surprise coming for Neuromancy next year…

About these ads

About Neuromancy
Neuromancy, or "Craig", is a 26 year old with a neuroscience PhD, looking to take the next step in his life, and blogging about neuroscience, psychology, science and anything else that interests and/or annoys him.

Comments are closed.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: