Reshaping Data in R

Sep 29

Written By

One of the frustrating things about using R (although it can be a positive thing too) is that there are often 25 different ways to do the same, or almost the same thing. Therefore, you want to do something, you have a look around, and then you find something that looks like it does what you want, but it doesn’t. It almost does, but it doesn’t. A few dead ends later and you have hit on something that works. The relief is tangible.

As I use R more (I’m supposed to be writing a bloody book on it, but seriously it’ll be the blind leading the blind), I find out how to do more stuff. I’m going to share things here. Mainly this is so I’ll remember to stick it in the book.

So, I had a data set that looked a bit like this:

Percent animal

75 Cat

48 Cat

72 Cat

58 Cat

78 Cat

68 Cat

62 Cat

65 Human

68 Human

58 Human

65 Human

78 Snail

68 Snail

58 Snail

48 Snail

65 Snail

68 Snail

Actually it looked exactly like that except I’ve ignored most of the data because no-one likes scrolling through data. I’ve also changed the example because it was tedious. A cat was recently asked to do jury service (http://www.telegraph.co.uk/news/newstopics/howaboutthat/8264782/Cat-ordered-to-do-jury-service.html) … hopefully the trial wasn’t for a cat burglar. Imagine , we wanted to test their abilities. We gave them several trial cases on a video screen, then asked them to decide guilty or not guilty by pressing a button. Their score is the number of decisions that they got correct. As a control we had some humans and snails. Not sure what the snails control for, but I like them (their eyes are cute when they stick out).

So, I’m doing a one-way ANOVA and I have three groups (helpfully labeled cat, human, snail) and an outcome: number of felons correctly identified as guilty. The data above are in the format SPSS expects them to be in. Then I decide that I want to do a robust ANOVA because the data have a weird distribution. To do this, R needs to see my in columns like so:

Cat Human Snail

75 65 78

48 68 68

72 58 58

58 65 48

78 65 65

68 68

Took me a while to work out how to do this very easy task. Basically, if your original data are saved in a data frame called ‘jury’ that has two variables (percent and animal) then you create a new data frame (I’ve called it newData) using unstack():

newData<-unstack(jury, percent~animal)

This will break up the variable ‘percent’ into columns based on the variable ‘animal’ in the data frame called ‘jury’.

There’s also a package called reshape that has functions cast() and melt() which took me ages to get my head around, but that I ended up using a lot in the book. They’re better for things that aren’t as simple as this example.

Andy FieldHow-toStatistics

Reshaping Data in R

Meta-Analysis and SEM

Focus Groups are We Stuck in a Rut?

Subscribe to our methods mailing list

Sage Research Methods Community