Reshaping Data in R
One of the frustrating things about using R (although it can be a positive thing too) is that there are often 25 different ways to do the same, or almost the same thing. Therefore, you want to do something, you have a look around, and then you find something that looks like it does what you want, but it doesn’t. It almost does, but it doesn’t. A few dead ends later and you have hit on something that works. The relief is tangible.
As I use R more (I’m supposed to be writing a bloody book on it, but seriously it’ll be the blind leading the blind), I find out how to do more stuff. I’m going to share things here. Mainly this is so I’ll remember to stick it in the book.
So, I had a data set that looked a bit like this:
Percent animal
75 Cat
48 Cat
72 Cat
58 Cat
78 Cat
68 Cat
62 Cat
65 Human
68 Human
58 Human
65 Human
65 Human
78 Snail
68 Snail
58 Snail
48 Snail
65 Snail
68 Snail
Actually it looked exactly like that except I’ve ignored most of the data because no-one likes scrolling through data. I’ve also changed the example because it was tedious. A cat was recently asked to do jury service (http://www.telegraph.co.uk/news/newstopics/howaboutthat/8264782/Cat-ordered-to-do-jury-service.html) … hopefully the trial wasn’t for a cat burglar. Imagine , we wanted to test their abilities. We gave them several trial cases on a video screen, then asked them to decide guilty or not guilty by pressing a button. Their score is the number of decisions that they got correct. As a control we had some humans and snails. Not sure what the snails control for, but I like them (their eyes are cute when they stick out).
So, I’m doing a one-way ANOVA and I have three groups (helpfully labeled cat, human, snail) and an outcome: number of felons correctly identified as guilty. The data above are in the format SPSS expects them to be in. Then I decide that I want to do a robust ANOVA because the data have a weird distribution. To do this, R needs to see my in columns like so:
Cat Human Snail
75 65 78
48 68 68
72 58 58
58 65 48
78 65 65
68 68
62
Took me a while to work out how to do this very easy task. Basically, if your original data are saved in a data frame called ‘jury’ that has two variables (percent and animal) then you create a new data frame (I’ve called it newData) using unstack():
newData<-unstack(jury, percent~animal)
This will break up the variable ‘percent’ into columns based on the variable ‘animal’ in the data frame called ‘jury’.
There’s also a package called reshape that has functions cast() and melt() which took me ages to get my head around, but that I ended up using a lot in the book. They’re better for things that aren’t as simple as this example.