Ordered bar graphs using ggplot2 and facet

user2117258 Source

I have a data.frame that looks something like this:

                 HSP90AA1      SSH2      ACTB TotalTranscripts
ESC_11_TTCGCCAAATCC  8.053308 12.038484 10.557234         33367.23
ESC_10_TTGAGCTGCACT  9.430003 10.687959 10.437068         30285.41
ESC_11_GCCGCGTTATAA  7.953726  9.918988 10.078192         30133.94
ESC_11_GCATTCTGGCTC 11.184402 11.056144  8.316846         24857.07
ESC_11_GTTACATTTCAC 11.943733 11.004500  9.240883         23629.00
ESC_11_CCGTTGCCCCTC  7.441695  9.774733  7.566619         22792.18

The TotalTranscripts column is sorted in descending order. What I'd like to do is generate three bar graphs using ggplot2 with each bar graph corresponding to each column of the data.frame with the exception of TotalTranscripts. I'd like the bar graphs to be ordered by TotalTranscripts just as the data.frame. I would be ideal to have these bar graphs on one plot using a facet wrap.

Any help would be greatly appreciated! Thank you!

EDIT: Here is my current code using barplot().

cells = "ESC"
genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]

barplot(as.matrix(g[1]), beside=TRUE, names.arg=paste(rownames(g)," (",g$TotalTranscripts,")",sep=""), las=2, col="light blue", cex.names=0.3, main=paste(colnames(g)[1], "\nCells sorted by total number of transcripts (colSums)", sep=""))

This will generate a plot that looks like this.

Again, the problem I seem to be having here is how to have multiple of these plots on the same image. I would like to add 20+ columns to this data.frame but I've cut this down to 3 for the sake of simplicity.

EDIT: Current code incorporating the answer below

cells = "ESC"
genes = rownames(data[x,])[1:8]
# genes = c("HSP90AA1", "SSH2", "ACTB")
g = data[genes,grep(cells, colnames(data))]
g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))])
colnames(g)[ncol(g)] = "TotalTranscripts"
g = g[order(g$TotalTranscripts, decreasing=T), , drop=F]
g$rowz <- row.names(g)
g$Cells <- reorder(g$rowz, rev(g$TotalTranscripts))
df1 <- melt(g, id.vars = c("Cells", "TotalTranscripts"), measure.vars=genes)
ggplot(df1, aes(x = Cells, y = value)) + geom_bar(stat = "identity") +
  theme(axis.title.x=element_blank(), axis.text.x = element_blank()) +
  facet_wrap(~ variable, scales = "free") + 
  theme_bw() + theme(axis.text.x = element_text(angle = 90))
rggplot2bar-chart

Answers

answered 2 years ago oshun #1

Here is the example data for anybody else:

df <- structure(list(HSP90AA1 = c(8.053308, 9.430003, 7.953726, 11.184402, 
                                  11.943733, 7.441695), SSH2 = c(12.038484, 10.687959, 9.918988, 
                                                                 11.056144, 11.0045, 9.774733), ACTB = c(10.557234, 10.437068, 
                                                                                                         10.078192, 8.316846, 9.240883, 7.566619), TotalTranscripts = c(33367.23, 
                                                                                                                                                                        30285.41, 30133.94, 24857.07, 23629, 22792.18)), .Names = c("HSP90AA1", 
                                                                                                                                                                                                                                    "SSH2", "ACTB", "TotalTranscripts"), class = "data.frame", row.names = c("ESC_11_TTCGCCAAATCC", 
                                                                                                                                                                                                                                                                                                             "ESC_10_TTGAGCTGCACT", "ESC_11_GCCGCGTTATAA", "ESC_11_GCATTCTGGCTC", 
                                                                                                                                                                                                                                                                                                             "ESC_11_GTTACATTTCAC", "ESC_11_CCGTTGCCCCTC"))

And here is a solution:

#New column for row names so they can be used as x-axis elements
df$rowz <- row.names(df)
#Explicitly order the rows (see the Kohske link)
df$rowz1 <- reorder(df$rowz, rev(df$TotalTranscripts))

library(reshape2)
#Melt the data from wide to long
df1 <- melt(df, id.vars = c("rowz1", "TotalTranscripts"), 
                measure.vars = c("HSP90AA1", "SSH2", "ACTB"))

library(ggplot2)
gp <- ggplot(df1, aes(x = rowz1, y = value)) + geom_bar(stat = "identity") + 
  facet_wrap(~ variable, scales = "free") + 
  theme_bw() 
gp + theme(axis.text.x = element_text(angle = 90))

ordered bargraph ggplot facets

This example by Kohske is a constant reference for me on ordering elements in ggplot2.

If you have many columns, but the same six ESC complexes, you can switch the groupings, i.e. x = variable and facet_wrap(~ rowz1), but this fundamentally changes how you are visualizing/comparing your data. Also, consider facet_grid(row ~ column) if you can organize the columns by 2 components (Columns being the data that are melted into 'variable' and 'value').

And this additional SO solution isn't related to your question, but it is an elegant way to reorder elements in each facet by their values (for future reference).

Finally, the method that will give you the finest control is to plot each graph separately and combine the grobs. Baptiste's packages like gridExtra and gtable are useful for these tasks.

**EDIT in response to new information from OP**

The OP has subsequently asked how to visualize the data, especially when there are more ESC categorical variables (up to 600+).

Here are some examples, with the big caveat that with many categorical variables, they should be grouped or converted to a continuous variable somehow.

#Plot colour to a few discrete, categorical variables
gp + aes(fill = rowz1) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + 
  labs(x = NULL, fill = "Cell", title = "Discrete categorical variables")

#Plot colour on a continuous scale.
#Ultimately, not appropriate for this example! (but shown for reference)
#More appropriate: fill = TotalTranscripts
gp + aes(fill = as.numeric(rowz1)) + 
  theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + 
  labs(x = NULL, title = "Continuous variables (legend won't work for many values)") +
  scale_fill_gradient2(name = "Cell",
                       breaks = as.numeric(df1$rowz1), 
                       labels = df1$rowz1, 
                       midpoint=median(as.numeric(df1$rowz1)))

#x is continuous, colour plotted to the categorical variable.  
#Same caveats as earlier.
gp1 <- ggplot(df1, aes(x = TotalTranscripts/1000, y = value, colour = rowz1)) + 
  geom_point(size=3) + facet_wrap(~ variable, scales = "free") + 
  labs(title = "X is an actual continuous variable") +
  theme_bw() + labs(x = bquote("Total Transcripts,"~10^3), colour = "Cell") 
gp1

discrete categorical color variables continuous color variables continuous x axis with discrete colours

comments powered by Disqus