Defining plot colors based on a variable
On Mon, Feb 2, 2009 at 8:56 AM, Andrew Singleton <singleta at mail.nih.gov> wrote:
Hi, I have been trying unsuccessfully to plot data using different colors
based on a variable within a subset of an imported file. The file I am
reading is about 20000 lines long and has a column (in the example called
FILE) that contains approximately 100 unique entries. I would like to plot a
subset of the data from the file and key the color from the FILE column,
This is what my file looks like :
CHR SNP BP NMISS BETA SE R2
T P REGION FILE RANDOM
1 rs17035189 10519610 135 0.3518 1.928 0.0002501
0.1824 0.8555 TCTX 4730341 0.284627081
6 rs3763311 32484154 109 -2.05 1.624 0.01467
-1.262 0.2096 TCTX 670603 0.083147673
6 rs3892710 32790839 106 0.5695 4.743 0.0001386
0.1201 0.9047 TCTX 7150403 0.549192815
6 rs3864300 32379785 102 9.208 6.416 0.02018
1.435 0.1544 TCTX 7210017 0.837265988
6 rs6912002 32873245 13 -1.295 5.043 0.005963
-0.2569 0.802 TCTX 2710441 0.170566699
5 rs4024109 35955374 9 26.19 31.01 0.09245
0.8444 0.4263 TCTX 2650653 0.298573497
6 rs3129719 32769757 16 10.35 7.44 0.1215
1.391 0.1859 TCTX 2900504 0.378538235
6 rs476885 32402690 109 -0.09378 1.552 3.411e-05
-0.06041 0.9519 TCTX 670603 0.017970964
10 rs12570766 5602540 139 0.6182 6.66 6.289e-05
0.09283 0.9262 TCTX 4560767 0.004973939
etc
And this is the code that I have:
assoc_data <- read.table("master.out", header =TRUE)
par(fig=c(0, 10, 0, 10 )/10, mar=c(10,8,2,8),xpd=NA, cex.axis=2)
attach(assoc_data)
curr_assoc <- assoc_data[CHR == 1 & BP > 500000 & BP < 1000000, ] #these
criteria change based on input from another file
#count the number of transcripts
transcripts <- length(unique(curr_assoc$FILE))
#generate that number of unique ?FILE? entries in my subset of data
my_colors <- rainbow(transcripts)
plot(curr_assoc$BP, log10(curr_assoc$P)*-1, pch=20,
col=c(my_colors)[curr_assoc$FILE], ylim=c(-15, 15),xaxs="i", xlab=NA,
cex=0.7, cex.lab=2)
detach(assoc_data)
You might find it easier to use ggplot2:
install.packages("ggplot2")
library(ggplot2)
qplot(BP, P, data = curr_assoc, colour = FILE, log="y")
To ensure that you always have the same colours, you can set the
limits for the colour scale (in analogous way to setting the limits
for the x axis):
qplot(BP, P, data = curr_assoc, colour = FILE, log="y") +
scale_colour_hue(limits = c(2, 7, 12, 34, 60, 64, 65, 70, 71))
Hadley