Message-ID: <1351792203.96551.YahooMailNeo@web142601.mail.bf1.yahoo.com>
Date: 2012-11-01T17:50:03Z
From: arun
Subject: subset a defined row plus the aforegoing
In-Reply-To: <CAKyZeBtbJdYp48zJ8P+fsx7_dZb0hNu4Xw4=1TgjO+H6156Upg@mail.gmail.com>
Hello,
A bit confusing:
" I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n"
In the dataset, there is "Expresssion" for column "type". If you want to subset all the rows having "Expresssion" or "DNaseHS"
res<- subset(df,type=="Expresssion"|type=="DNase HS")
head(res)
#? start.ens fc.trans??????? type end.ens peak end.grcm38 dpeak
#1?? 9191942?? 0.9379 Expresssion????? NA?? NA???????? NA??? NA
#2?? 9191942?? 0.9741 Expresssion????? NA?? NA???????? NA??? NA
#3?? 9191942?? 0.9748 Expresssion????? NA?? NA???????? NA??? NA
#4?? 9195570?????? NA??? DNase HS????? NA?? NA??? 9195792?? 109
#5?? 9579854?????? NA??? DNase HS????? NA?? NA??? 9580110?? 131
#7? 11113787?????? NA??? DNase HS????? NA?? NA?? 11114262?? 279
If you don't want those rows:
subset(df,type!="Expresssion"&type!="DNase HS")
#? start.ens fc.trans type? end.ens peak end.grcm38 dpeak
#6? 11088023?????? NA p300 11088523??? 7???????? NA??? NA
A.K.
----- Original Message -----
From: Hermann Norpois <hnorpois at googlemail.com>
To: r-help at r-project.org
Cc:
Sent: Thursday, November 1, 2012 1:28 PM
Subject: [R] subset a defined row plus the aforegoing
Hello,
my data is sorted by start.ens (see below). And now I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n). I dont know how
to add this to my subset command.
Is that possible?
Thanks Hermann
> df
? start.ens fc.trans? ? ? ? type? end.ens peak end.grcm38 dpeak
1? ? 9191942? 0.9379 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
2? ? 9191942? 0.9741 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
3? ? 9191942? 0.9748 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
4? ? 9195570? ? ? NA? ? DNase HS? ? ? NA? NA? ? 9195792? 109
5? ? 9579854? ? ? NA? ? DNase HS? ? ? NA? NA? ? 9580110? 131
6? 11088023? ? ? NA? ? ? ? p300 11088523? ? 7? ? ? ? NA? ? NA
7? 11113787? ? ? NA? ? DNase HS? ? ? NA? NA? 11114262? 279
8? 11114744? 0.9803 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
9? 11114744? 0.9904 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
10? 11114850? ? ? NA? ? DNase HS? ? ? NA? NA? 11115400? 210
11? 11455056? ? ? NA? ? DNase HS? ? ? NA? NA? 11455381? 175
12? 11461513? ? ? NA? ? DNase HS? ? ? NA? NA? 11462571? 508
13? 11462408? 1.0129 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
14? 11462408? 1.0074 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
15? 11489266? 1.0019 Expresssion? ? ? NA? NA? ? ? ? NA? ? NA
My (test)data:
> dput (df)
structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans =
c(0.9379,
0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion",
"p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L,
NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names =
c("start.ens",
"fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names =
c(NA,
-15L), class = "data.frame")
??? [[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.