Scatterplot matrices and MANOVA in R

I’ve got a deadline to get one of my thesis chapters drafted by the end of the week. It was the end of last week, but all the unemplyment and job seeking stuff got in the way, and the analysis has been a little bit harder than I expected, so it got pushed back (oops, guess I should be used to that by now).

Anyway, I’ve been doing some of the analysis in R, and I thought I’d stick it up here to get some tips/feedback. Please bear with me on the state of the figures, they’re a bit sketchy at the moment.

Click ‘More’ for tedious technicalities!

Read more of this post

The staRt of a jouRney

I thought I’d already done a proper introduction of my move into using the R language to analyse the data for my thesis, but I can’t find anything now… I’d previously been using GraphPad Prism, which was easy to use and everything was customisable at a click, but this meant it was a bit slow and bloated. MSExcel hasn’t even entered into the equation after the effort needed to display standard error bars, and also that some of the inference test calculations aren’t even right. Instead, I decided to try to teach myself R.

So here’s the start of my journey into R analysis, a “jouRney” if you will. R is an open source language developed from the commercial language S, which in turn derives from C. Straight out of the box it can do some reasonably advanced data extraction, analysis and visualisation. It’s real strength, however, comes from its openness. Being open source means that thousands of stat geeks around the world have developed add-on “packages” that develop R’s abilities into all sorts of areas of research.

So far, I’m just beginning to learn how to read data in, manipulate it and visualise it, but here’s the result of my first attempt to produce a heat map/contour plot of the firing rate of a dopamine neuron in response to a stimulus, across trials:

#reads in raw data from text file produced by heatplot.s2s
contourplot <- read.table("c:\\documents and settings\\craig\\my documents\\my dropbox\\R stuff\\rawoutput.txt", header=TRUE, sep="\t")
#trims off empty columns
contourplot <- contourplot[,1:40]
#sets table to contain only the data columns
contourplot <- contourplot[,c(2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40)]
#transposes the data
contourplot <- t(contourplot)
#makes table into data matrix
contourplot_matrix <-data.matrix(contourplot)
#Plots heatmap of matrix "contourplot_matrix"
filled.contour(x=seq(1, 300, length=20), y=seq(-0.5, 1.5, length=20), z=contourplot_matrix[1:20,1:20], nlevels=20, axes=TRUE, color.palette=topo.colors)

Which produces this:

Comparison of heatplot in R and the raw data.

A comparison of contour plot in R (right) and the raw data from Spike2 - Conotur plot rotated 90 degrees anticlockwise

Which is essentially a grid of the mean firing rate in 100 ms by 15 trial bins. Hopefully, I should be able to use these data in a principal component analysis, and see if there are separate components which change over time at different rates.

Eventually, I’m aiming to do all of my statistics in R, so we’ll see how it goes.

Looking back on my PhD

Now that I’m in the hopefully fairly short final straight of my Phd, I’m feeling a bit reflective. Whilst I’ve been writing up, I’ve been reading around the science behind my project in some depth, and only really now starting to get a grip on the theory behind the experiments. Unfortunately in some cases, this has meant that I’ve learnt that some of the experiments I’ve done probably weren’t going to work the way we intended, and probably weren’t worth the struggle. I always knew that when you came out of a PhD, you were supposed to be an expert in that tiny subfield. With this in mind, I started to wonder whether when my supervisor set me on an experiment, I was supposed to go away and understand the problem before I even started on it. What I usually did was understood the problem enough to see the rationale behind the experiments, did them, then went to my supervisor with the data. It’s only now with reading and writing up I can see all the potential problems.

I’ve been worried that I’ve left what feels like the bulk of my PhD to the end. The experiments were just things that I did, but the learning and understanding is what the PhD is about, surely? I felt like I’d missed something. It’s made worse by the fact that I came to a neuroscience PhD with only AS-levels in biology and chemistry, straight from a psychology bachelors degree unfortunately lacking in in-depth neuroscience content, without any intervening masters. I was fortunate to be ably taught how to do extracellular electrophysiology by a patient post-doc who started in the department at the same time as me (thanks Lio!) who was replaced by his partner in crime from his PhD, who was equally helpful (thanks Nico!). But I was worried I’d skipped the theoretical fundamentals behind neuroscience and electrophysiology.

Fortunately I’ve been reassured by older colleagues that  it’s pretty common to have reached the end of a PhD before you have learned enough to realise that you should have done things differently, and that a large part of a PhD is learning where your strengths and weaknesses lie, and not knowing them at the start and be done with them by the time you’ve finished. I feel a bit happier about it all, but like with many things in life, I still sometimes wish I could do it all again with the benefit of hindsight. Sometimes I wish I’d come to the PhD from a neuroscience skewed biology degree instead of psychology, but then I wouldn’t have been in the right place at the right time to have the fantastic opportunity I have had.

I envy the guys in the States who do PhDs for something like 7 years, with lots of studying before getting stuck into a project. Three years seems pretty inadequate compared to that. Cardiff seem to have it down too, their neuroscience PhD program consists of three years of experimental project, but first a year of study which is half modules explicitly teaching you how to do a neuroscience PhD (the nearest thing at UoS – MSc Cognitive and Computational Neuroscience – seems to try to cover too many bases at once) and the other half completing three practical lab rotations. I tried to take some second year undergrad biology neuroscience modules as extra credit, but I was told that they had to be masters level or above to qualify. This is in spite of the existence of accredited modules which are designed for the extra credit system which involve taking a first year undergrad psych module, or even having a discussion group about your PhD in a coffee shop…

Ignoring the green-eyed academic in me, I’m glad I know what the educational known unknowns are for me. I’m just dying to make them known knowns, and I hope I get the chance to do it.

Lab books

It’s probably quite late in my PhD to finally take this on board (I’m at the end of my final year now), but I’ve realised that my note taking for my experimental lab book is pretty poor. I might have a future career in academia which means I can make use of this new attitude to my organisational skills, but I’m going to at least make the most of it in my last few months.
It’s not that I don’t write enough down for me to remember what I’ve done throughout my PhD, but if I ever need to come back in 10 years, or if someone else needs to understand my notes, especially if they don’t have the first clue about my experiments, they won’t be easy to understand.
To be fair to myself, I don’t seem to write much more or less detail than most of the people I work with. Recently, however, I read some example lab books as part of a guide to successful academia and even little things like writing the date out in full in case you move from Europe to US to Canada seemed to make sense. Mostly though, I realised that my notes a) make the assumption that you know what the study is and why it’s being done and b) sometimes only make sense if you follow the ‘story’ over the course of a few experiments.

It’s a late lesson to learn, and it’s not a big lesson, but if I’m going to be pro-open-access science, I guess openness should begin at home.

EDIT: I’ve just tripped over a Wikipedia article on Open Notebook Science, based around the idea of full openness with data, described in a blogpost by Jean-Claude Bradley. There’s a list of active open notebooks, which might be interesting.

Are online archives restricting what we cite?

I recently read article at Code for Life inspired by a post from fellow science blogger Isis about an apparently increasing tendency to cite recent research papers on a topic that stretches back much further, and researchers publishing ‘new’ discoveries or techniques which had already been made. The suggestion was that the availability of recent papers online has made us less willing to cite older work. It made me think about my own referencing habits, and whether I was measuring up.

Read more of this post

Follow

Get every new post delivered to your Inbox.