I have a data.frame that looks something like this:
HSP90AA1 SSH2 ACTB TotalTranscripts ESC_11_TTCGCCAAATCC 8.053308 12.038484 10.557234 33367.23 ESC_10_TTGAGCTGCACT 9.430003 10.687959 10.437068 30285.41 ESC_11_GCCGCGTTATAA 7.953726 9.918988 10.078192 30133.94 ESC_11_GCATTCTGGCTC 11.184402 11.056144 8.316846 24857.07 ESC_11_GTTACATTTCAC 11.943733 11.004500 9.240883 23629.00 ESC_11_CCGTTGCCCCTC 7.441695 9.774733 7.566619 22792.18
TotalTranscripts column is sorted in descending order. What I'd like to do is generate three bar graphs using ggplot2 with each bar graph corresponding to each column of the data.frame with the exception of
TotalTranscripts. I'd like the bar graphs to be ordered by
TotalTranscripts just as the data.frame. I would be ideal to have these bar graphs on one plot using a facet wrap.
Any help would be greatly appreciated! Thank you!
EDIT: Here is my current code using barplot().
cells = "ESC" genes = c("HSP90AA1", "SSH2", "ACTB") g = data[genes,grep(cells, colnames(data))] g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))]) colnames(g)[ncol(g)] = "TotalTranscripts" g = g[order(g$TotalTranscripts, decreasing=T), , drop=F] barplot(as.matrix(g), beside=TRUE, names.arg=paste(rownames(g)," (",g$TotalTranscripts,")",sep=""), las=2, col="light blue", cex.names=0.3, main=paste(colnames(g), "\nCells sorted by total number of transcripts (colSums)", sep=""))
This will generate a plot that looks like this.
Again, the problem I seem to be having here is how to have multiple of these plots on the same image. I would like to add 20+ columns to this data.frame but I've cut this down to 3 for the sake of simplicity.
EDIT: Current code incorporating the answer below
cells = "ESC" genes = rownames(data[x,])[1:8] # genes = c("HSP90AA1", "SSH2", "ACTB") g = data[genes,grep(cells, colnames(data))] g = data.frame(t(g), colSums(data)[grep(cells, colnames(data))]) colnames(g)[ncol(g)] = "TotalTranscripts" g = g[order(g$TotalTranscripts, decreasing=T), , drop=F] g$rowz <- row.names(g) g$Cells <- reorder(g$rowz, rev(g$TotalTranscripts)) df1 <- melt(g, id.vars = c("Cells", "TotalTranscripts"), measure.vars=genes) ggplot(df1, aes(x = Cells, y = value)) + geom_bar(stat = "identity") + theme(axis.title.x=element_blank(), axis.text.x = element_blank()) + facet_wrap(~ variable, scales = "free") + theme_bw() + theme(axis.text.x = element_text(angle = 90))
Here is the example data for anybody else:
df <- structure(list(HSP90AA1 = c(8.053308, 9.430003, 7.953726, 11.184402, 11.943733, 7.441695), SSH2 = c(12.038484, 10.687959, 9.918988, 11.056144, 11.0045, 9.774733), ACTB = c(10.557234, 10.437068, 10.078192, 8.316846, 9.240883, 7.566619), TotalTranscripts = c(33367.23, 30285.41, 30133.94, 24857.07, 23629, 22792.18)), .Names = c("HSP90AA1", "SSH2", "ACTB", "TotalTranscripts"), class = "data.frame", row.names = c("ESC_11_TTCGCCAAATCC", "ESC_10_TTGAGCTGCACT", "ESC_11_GCCGCGTTATAA", "ESC_11_GCATTCTGGCTC", "ESC_11_GTTACATTTCAC", "ESC_11_CCGTTGCCCCTC"))
And here is a solution:
#New column for row names so they can be used as x-axis elements df$rowz <- row.names(df) #Explicitly order the rows (see the Kohske link) df$rowz1 <- reorder(df$rowz, rev(df$TotalTranscripts)) library(reshape2) #Melt the data from wide to long df1 <- melt(df, id.vars = c("rowz1", "TotalTranscripts"), measure.vars = c("HSP90AA1", "SSH2", "ACTB")) library(ggplot2) gp <- ggplot(df1, aes(x = rowz1, y = value)) + geom_bar(stat = "identity") + facet_wrap(~ variable, scales = "free") + theme_bw() gp + theme(axis.text.x = element_text(angle = 90))
This example by Kohske is a constant reference for me on ordering elements in ggplot2.
If you have many columns, but the same six ESC complexes, you can switch the groupings, i.e.
x = variable and
facet_wrap(~ rowz1), but this fundamentally changes how you are visualizing/comparing your data. Also, consider
facet_grid(row ~ column) if you can organize the columns by 2 components (Columns being the data that are melted into 'variable' and 'value').
And this additional SO solution isn't related to your question, but it is an elegant way to reorder elements in each facet by their values (for future reference).
Finally, the method that will give you the finest control is to plot each graph separately and combine the grobs. Baptiste's packages like gridExtra and gtable are useful for these tasks.
The OP has subsequently asked how to visualize the data, especially when there are more ESC categorical variables (up to 600+).
Here are some examples, with the big caveat that with many categorical variables, they should be grouped or converted to a continuous variable somehow.
#Plot colour to a few discrete, categorical variables gp + aes(fill = rowz1) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + labs(x = NULL, fill = "Cell", title = "Discrete categorical variables") #Plot colour on a continuous scale. #Ultimately, not appropriate for this example! (but shown for reference) #More appropriate: fill = TotalTranscripts gp + aes(fill = as.numeric(rowz1)) + theme(axis.text.x = element_blank(), axis.ticks.x = element_blank()) + labs(x = NULL, title = "Continuous variables (legend won't work for many values)") + scale_fill_gradient2(name = "Cell", breaks = as.numeric(df1$rowz1), labels = df1$rowz1, midpoint=median(as.numeric(df1$rowz1))) #x is continuous, colour plotted to the categorical variable. #Same caveats as earlier. gp1 <- ggplot(df1, aes(x = TotalTranscripts/1000, y = value, colour = rowz1)) + geom_point(size=3) + facet_wrap(~ variable, scales = "free") + labs(title = "X is an actual continuous variable") + theme_bw() + labs(x = bquote("Total Transcripts,"~10^3), colour = "Cell") gp1