Skip to content
Prev 309601 / 398506 Next

subset a defined row plus the aforegoing

Hello,

A bit confusing:
" I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n"


In the dataset, there is "Expresssion" for column "type". If you want to subset all the rows having "Expresssion" or "DNaseHS"

res<- subset(df,type=="Expresssion"|type=="DNase HS")
head(res)
#? start.ens fc.trans??????? type end.ens peak end.grcm38 dpeak
#1?? 9191942?? 0.9379 Expresssion????? NA?? NA???????? NA??? NA
#2?? 9191942?? 0.9741 Expresssion????? NA?? NA???????? NA??? NA
#3?? 9191942?? 0.9748 Expresssion????? NA?? NA???????? NA??? NA
#4?? 9195570?????? NA??? DNase HS????? NA?? NA??? 9195792?? 109
#5?? 9579854?????? NA??? DNase HS????? NA?? NA??? 9580110?? 131
#7? 11113787?????? NA??? DNase HS????? NA?? NA?? 11114262?? 279


If you don't want those rows:
subset(df,type!="Expresssion"&type!="DNase HS")
#? start.ens fc.trans type? end.ens peak end.grcm38 dpeak
#6? 11088023?????? NA p300 11088523??? 7???????? NA??? NA
A.K.




----- Original Message -----
From: Hermann Norpois <hnorpois at googlemail.com>
To: r-help at r-project.org
Cc: 
Sent: Thursday, November 1, 2012 1:28 PM
Subject: [R] subset a defined row plus the aforegoing

Hello,

my data is sorted by start.ens (see below). And now I would like to extract
all rows (so called* defined row*s) with type==Expression - subset (df,
type==Expression) - and the aforegoing type==DNase HS (which is not
necessarly row n-1 - assumung that the defined row is n). I dont know how
to add this to my subset command.

Is that possible?
Thanks Hermann
?  start.ens fc.trans? ? ? ? type? end.ens peak end.grcm38 dpeak
1? ? 9191942?  0.9379 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
2? ? 9191942?  0.9741 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
3? ? 9191942?  0.9748 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
4? ? 9195570? ? ?  NA? ? DNase HS? ? ?  NA?  NA? ? 9195792?  109
5? ? 9579854? ? ?  NA? ? DNase HS? ? ?  NA?  NA? ? 9580110?  131
6?  11088023? ? ?  NA? ? ? ? p300 11088523? ? 7? ? ? ?  NA? ? NA
7?  11113787? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11114262?  279
8?  11114744?  0.9803 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
9?  11114744?  0.9904 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
10? 11114850? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11115400?  210
11? 11455056? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11455381?  175
12? 11461513? ? ?  NA? ? DNase HS? ? ?  NA?  NA?  11462571?  508
13? 11462408?  1.0129 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
14? 11462408?  1.0074 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA
15? 11489266?  1.0019 Expresssion? ? ?  NA?  NA? ? ? ?  NA? ? NA

My (test)data:
structure(list(start.ens = c(9191942L, 9191942L, 9191942L, 9195570L,
9579854L, 11088023L, 11113787L, 11114744L, 11114744L, 11114850L,
11455056L, 11461513L, 11462408L, 11462408L, 11489266L), fc.trans =
c(0.9379,
0.9741, 0.9748, NA, NA, NA, NA, 0.9803, 0.9904, NA, NA, NA, 1.0129,
1.0074, 1.0019), type = structure(c(2L, 2L, 2L, 1L, 1L, 3L, 1L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("DNase HS", "Expresssion",
"p300"), class = "factor"), end.ens = c(NA, NA, NA, NA, NA, 11088523L,
NA, NA, NA, NA, NA, NA, NA, NA, NA), peak = c(NA, NA, NA, NA,
NA, 7L, NA, NA, NA, NA, NA, NA, NA, NA, NA), end.grcm38 = c(NA,
NA, NA, 9195792L, 9580110L, NA, 11114262L, NA, NA, 11115400L,
11455381L, 11462571L, NA, NA, NA), dpeak = c(NA, NA, NA, 109L,
131L, NA, 279L, NA, NA, 210L, 175L, 508L, NA, NA, NA)), .Names =
c("start.ens",
"fc.trans", "type", "end.ens", "peak", "end.grcm38", "dpeak"), row.names =
c(NA,
-15L), class = "data.frame")

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.