Hi all; I have a file which has about 3.000.000 lines. Most of the lines at first column start with "rs", for example, rs10000056, rs10000076 and so on. I would like to get the lines which do not start with "rs" . Your helps highly appreciated. Regards, Greg
lines those not started with "rs"
7 messages · greg holly, rsherry8, Rui Barradas +1 more
Greg,
I am assuming that your data is in a text file. R is a good tool but not
the tool I would use for this job. The tool I would
use is grep. The following command should get you want you want:
grep -v "^rs" <data file name>
Bob
On 1/30/2017 9:23 AM, greg holly wrote:
Hi all; I have a file which has about 3.000.000 lines. Most of the lines at first column start with "rs", for example, rs10000056, rs10000076 and so on. I would like to get the lines which do not start with "rs" . Your helps highly appreciated. Regards, Greg [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Robert; I do appreciate your advice. Only the first column of the data is text. The rest columns are numeric. Regards, Greg
On Mon, Jan 30, 2017 at 9:36 AM, Robert Sherry <rsherry8 at comcast.net> wrote:
Greg,
I am assuming that your data is in a text file. R is a good tool but not
the tool I would use for this job. The tool I would
use is grep. The following command should get you want you want:
grep -v "^rs" <data file name>
Bob
On 1/30/2017 9:23 AM, greg holly wrote:
Hi all;
I have a file which has about 3.000.000 lines. Most of the lines at first
column start with "rs", for example, rs10000056, rs10000076 and so on. I
would like to get the lines which do not start with "rs" . Your helps
highly appreciated.
Regards,
Greg
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posti ng-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posti ng-guide.html and provide commented, minimal, self-contained, reproducible code.
then my solution should work. Bob
On 1/30/2017 9:44 AM, greg holly wrote:
Hi Robert;
I do appreciate your advice. Only the first column of the data is
text. The rest columns are numeric.
Regards,
Greg
On Mon, Jan 30, 2017 at 9:36 AM, Robert Sherry <rsherry8 at comcast.net
<mailto:rsherry8 at comcast.net>> wrote:
Greg,
I am assuming that your data is in a text file. R is a good tool
but not the tool I would use for this job. The tool I would
use is grep. The following command should get you want you want:
grep -v "^rs" <data file name>
Bob
On 1/30/2017 9:23 AM, greg holly wrote:
Hi all;
I have a file which has about 3.000.000 lines. Most of the
lines at first
column start with "rs", for example, rs10000056, rs10000076
and so on. I
would like to get the lines which do not start with "rs" .
Your helps
highly appreciated.
Regards,
Greg
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing
list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at r-project.org <mailto:R-help at r-project.org> mailing list --
To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help>
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html>
and provide commented, minimal, self-contained, reproducible code.
Hello,
Try to study the following example.
A <- c("rs10000056", "rs10000076", "ab1234567")
x <- 1:3
dat <- data.frame(A, x)
inx <- grepl("^rs", dat$A)
dat[!inx, ]
Hope this helps,
Rui Barradas
Em 30-01-2017 14:23, greg holly escreveu:
Hi all; I have a file which has about 3.000.000 lines. Most of the lines at first column start with "rs", for example, rs10000056, rs10000076 and so on. I would like to get the lines which do not start with "rs" . Your helps highly appreciated. Regards, Greg [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Rui, et. al.: **IF** the data set can be read into R (3e6 lines x ?bytes/line ??) , then I think for a completely specified regular pattern such as that described by the OP, grep would be a bit inefficient. If x is a vector of strings, and you wish to remove all those that don't begin with "rs" then: x[!substring(x,1,2) == "rs"] took about 1/2 the time on my computer as the grepl() version for a vector,x, of length 1e6. To be fair, I suspect this may be a negigible difference, as most of the time would probably be taken in extracting and replacing rows from the data frame. Nevertheless, it seems worthwhile to highlight the use of simple, efficient, albeit limited, tools when they *can* be used. All, of course, assuming I have understood the query correctly. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jan 30, 2017 at 8:59 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
Try to study the following example.
A <- c("rs10000056", "rs10000076", "ab1234567")
x <- 1:3
dat <- data.frame(A, x)
inx <- grepl("^rs", dat$A)
dat[!inx, ]
Hope this helps,
Rui Barradas
Em 30-01-2017 14:23, greg holly escreveu:
Hi all;
I have a file which has about 3.000.000 lines. Most of the lines at first
column start with "rs", for example, rs10000056, rs10000076 and so on. I
would like to get the lines which do not start with "rs" . Your helps
highly appreciated.
Regards,
Greg
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
... heh, heh and even simpler (but maybe not much faster) x[substring(x,1,2) != "rs"] (DUHHH!) -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jan 30, 2017 at 11:18 AM, Bert Gunter <bgunter.4567 at gmail.com> wrote:
Rui, et. al.: **IF** the data set can be read into R (3e6 lines x ?bytes/line ??) , then I think for a completely specified regular pattern such as that described by the OP, grep would be a bit inefficient. If x is a vector of strings, and you wish to remove all those that don't begin with "rs" then: x[!substring(x,1,2) == "rs"] took about 1/2 the time on my computer as the grepl() version for a vector,x, of length 1e6. To be fair, I suspect this may be a negigible difference, as most of the time would probably be taken in extracting and replacing rows from the data frame. Nevertheless, it seems worthwhile to highlight the use of simple, efficient, albeit limited, tools when they *can* be used. All, of course, assuming I have understood the query correctly. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jan 30, 2017 at 8:59 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
Try to study the following example.
A <- c("rs10000056", "rs10000076", "ab1234567")
x <- 1:3
dat <- data.frame(A, x)
inx <- grepl("^rs", dat$A)
dat[!inx, ]
Hope this helps,
Rui Barradas
Em 30-01-2017 14:23, greg holly escreveu:
Hi all;
I have a file which has about 3.000.000 lines. Most of the lines at first
column start with "rs", for example, rs10000056, rs10000076 and so on. I
would like to get the lines which do not start with "rs" . Your helps
highly appreciated.
Regards,
Greg
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.