in R, melted data works with ggplot. Why does an identical manually constructed data set fail?

Adam Price Source

I have a dataframe that looks like this, lets call it t1 for now:

       D1                D3                D5        
 Min.   :-0.2692   Min.   :-0.4129   Min.   : 2.509  
 1st Qu.: 2.4232   1st Qu.: 2.9288   1st Qu.: 4.731  
 Median : 3.3372   Median : 4.0337   Median : 5.657  
 Mean   : 3.5321   Mean   : 4.1214   Mean   : 5.943  
 3rd Qu.: 4.4551   3rd Qu.: 5.0950   3rd Qu.: 6.935  
 Max.   : 9.2710   Max.   : 9.5757   Max.   :10.604 

I can melt that dataframe and it will look like this:

   variable    value
1        D1 5.121777
2        D1 7.129591
3        D1 6.568010
4        D1 9.271042
5        D1 6.246738
...      ...   
909      D5 6.323069
910      D5 6.397816
911      D5 6.293596
912      D5 5.167107
913      D5 4.118420
914      D5 5.733515
...      ....

I'm adding a third column to the melted data based on some group, so the final one looks something like this.

   variable    value   groupBy
1        D1 5.121777  group1
2        D1 7.129591  group1
3        D1 6.568010  group1
4        D1 9.271042  group1
5        D1 6.246738  group2
...      ...   
909      D5 6.323069  group4
910      D5 6.397816  group4
911      D5 6.293596  group4
912      D5 5.167107  group5
913      D5 4.118420  group5
914      D5 5.733515  group5
...      ....

My goal is to plot something where the X axis has like, D1, D5, etc. The "variable" in this dataframe, and the Y axis uses the value, and colors are split by group. This actually works fine.

ggplot(final_melt, aes(x = as.numeric(variable), y = value, colour = groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), method = 'glm')

This is what the image looks like using the current method after some styling.

Now, I want to do a variation on this, so I'm creating my own version of the melted data to plot instead.

  #This is in a loop and just creates "pseudo-melted" data.
  nameSet  <- colnames(result_dfs[[i]])
  meanSet  <- as.numeric(lapply(result_dfs[[i]], mean))
  groupVar <- rep((paste("group", i, sep="")), length(nameSet))
  cBound   <- cbind(nameSet,as.numeric(meanSet),groupVar)
  mean_dat <- rbind(mean_dat, cBound)

  #After the loop, make everything look just like the standard melted dataset.
  colnames(mean_dat) <- c("variable","value","groupVar")
  mean_dat <- data.frame(mean_dat)

So the manually constructed, psuedo-melted data looks like this. I just want the x-axis to have the "variable" categories and a line to go from condition to condition based on the value, with the groupVar coloring the individual lines.

   variable              value groupVar
1  Ebola_D1   2.08831695477086   group1
2  Ebola_D3   2.54949105549377   group1
3  Ebola_D5   4.15035141230915   group1
4  Ebola_D1 -0.390323691887409   group2
5  Ebola_D3  -1.83541896004176   group2
6  Ebola_D5  -1.12565386663147   group2
7  Ebola_D1  -0.83608582623162   group3
8  Ebola_D3  -7.55858863601214   group3
9  Ebola_D5  -2.52864397283096   group3
10 Ebola_D1  0.457247980555584   group4
11 Ebola_D3  0.957424853791735   group4
12 Ebola_D5   1.17865891001209   group4

First, let's just try the exact same thing:

> ggplot(series_dat, aes(x = as.numeric(variable), y = value, colour 
= groupVar)) + geom_smooth(aes(x = as.numeric(variable), y = value), 
method = 'glm')
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Don't know how to automatically pick scale for object of type 
        list. Defaulting to continuous.
    Error: stat_smooth requires the following missing aesthetics: y
    In addition: There were 24 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: In fun(x, ...) : NAs introduced by coercion
  .. . . . . 

Okay, so that doesn't work, so I tried to make it more simple, just a line plot.

> ggplot(series_dat, aes(x=variable, y=value, group = groupVar)) + 
geom_line(color ="blue")
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Don't know how to automatically pick scale for object of type list. 
      Defaulting to continuous.
  Error in order(data$PANEL, data$group, data$x) : 
    argument 3 is not a vector

So I've tried lots of variations on things, but I can't figure out why this manually created data wont function as the melted data. I feel a type issue, but I checked the typed of both and everything looks the same. I appreciate any insight anyone can offer. Thanks!

@joran mentioned to check the str(), here it is.

This is for the melted one:

'data.frame':   918 obs. of  2 variables:
 $ variable: Factor w/ 3 levels "D1","D3","D5": 1 1 1 1 1 1 1 1 1 1 ...
 $ value   : num  5.12 7.13 6.57 9.27 6.25 ...

And here's for the non-melted one.

'data.frame':   12 obs. of  3 variables:
$ variable:List of 12
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
 ..$ : chr "Ebola_D1"
 ..$ : chr "Ebola_D3"
 ..$ : chr "Ebola_D5"
$ value   :List of 12
 ..$ : chr "2.08831695477086"
 ..$ : chr "2.54949105549377"
 ..$ : chr "4.15035141230915"
 ..$ : chr "-0.390323691887409"
 ..$ : chr "-1.83541896004176"
 ..$ : chr "-1.12565386663147"
 ..$ : chr "-0.83608582623162"
 ..$ : chr "-7.55858863601214"
 ..$ : chr "-2.52864397283096"
 ..$ : chr "0.457247980555584"
 ..$ : chr "0.957424853791735"
 ..$ : chr "1.17865891001209"
$ groupVar:List of 12
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group1"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group2"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group3"
 ..$ : chr "group4"
 ..$ : chr "group4"
 ..$ : chr "group4"

So this is helpful, but I'm still not quite sure what to do with this.

rggplot2melt

Answers

answered 3 months ago joran #1

Be careful using cbind if you expect/want the result to be a data frame. Except under very specific circumstances, cbind() will tend to produce a matrix and hence will convert everything to a single type.

The safest way to create a data frame from individual vectors is to simply use data.frame().

comments powered by Disqus