Reshaping only a few columns in a dataframe

varun carumbaiah Source

I am trying to reshape a dataframe in R. Here is the dataframe I have in dput:

dput(newdata)
    structure(list(var1 = c(0L, 0L, 0L, 0L, 0L, 0L), var2 = c(0L, 
    0L, 0L, 0L, 0L, 0L), var3 = c(0L, 0L, 0L, 0L, 0L, 0L), Date = structure(c(15260, 
    15260, 15260, 15169, 15169, 15169), class = "Date"), Success = structure(c(2L, 
    1L, 1L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor")), .Names = c("var1", 
    "var2", "var3", "Date", "Success"), row.names = c(NA, 6L), class = "data.frame")

Output I am look for:

Variable    Date    N   Y
var1    3/2/2012    0   1
var1    3/4/2012    0   1
var1    3/6/2012    0   1
var2    3/2/2012    1   0
var2    3/4/2012    1   0
var2    3/6/2012    1   0
var3    3/2/2012    0   1
var3    3/4/2012    0   1
var3    3/6/2012    0   1

I am fairly new to R. I have been trying to use reshape() module but been unsuccessful until now. Any insight would be hugely appreciated. Thank you.

r

Answers

answered 6 months ago kgolyaev #1

Thank you for providing reproducible input and desired output. This helps a lot. Unfortunately as your input is presented now is flawed: rows 2 and 3 in your data frame are identical, and so are rows 5 and 6. It would not be possible to perform your desired data transformation correctly on such data.

Assuming your duplicate rows are not relevant, you can accomplish your desired output via tidyr::spread() and tidyr::gather(). I call your data structure df:

library("dplyr") 
library("tidyr")

# call to duplicated() removes all identical rows from df 

wide <- df %>%
  filter(!duplicated(.)) %>% 
  gather(Variable, value, starts_with("var")) %>% 
  spread(Success, value, fill = NA, drop = FALSE)

wide
        Date Variable N Y
1 2011-07-14     var1 0 0
2 2011-07-14     var2 0 0
3 2011-07-14     var3 0 0
4 2011-10-13     var1 0 0
5 2011-10-13     var2 0 0
6 2011-10-13     var3 0 0    

answered 6 months ago Matt W. #2

So as kgolyaev stated, you have duplicate rows which means that spread can't simplify down to a single row when spreading the columns. One way around this is to just use a mutate with ifelse instead of spreading. This works because you just have "N" and "Y" for Success values. Had it been 12 unique values, it would have been a different solution.

We can gather the vars into vars and num. And then we can just use a simple nested ifelse statement to get the 1s and 0s. Then remove unneeded columns and arrange by Date.

library(tidyverse)

df %>% gather("vars", "num", -c(Date, Success)) %>%
        mutate(Y = ifelse(Success == "N", 0, 1),
               N = ifelse(Success == "N", 1, 0)) %>%
        select(-c(Success, num)) %>%
        arrange(Date)


         Date vars Y N
1  2011-07-14 var1 1 0
2  2011-07-14 var1 0 1
3  2011-07-14 var1 0 1
4  2011-07-14 var2 1 0
5  2011-07-14 var2 0 1
6  2011-07-14 var2 0 1
7  2011-07-14 var3 1 0
8  2011-07-14 var3 0 1
9  2011-07-14 var3 0 1
10 2011-10-13 var1 1 0
11 2011-10-13 var1 0 1
12 2011-10-13 var1 0 1
13 2011-10-13 var2 1 0
14 2011-10-13 var2 0 1
15 2011-10-13 var2 0 1
16 2011-10-13 var3 1 0
17 2011-10-13 var3 0 1
18 2011-10-13 var3 0 1

comments powered by Disqus