Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?
On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
"Charles Annis, P.E." <Charles.Annis at statisticalengineering.com> writes:
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
3) you're almost there, just not realizing it:
x <- "3,6,10"
as.numeric(strsplit(x,split = ",")[[1]])
[1] 3 6 10
or for that matter
scan(textConnection(x), sep=",")
Read 3 items
[1] 3 6 10
although that leaves you with a dangling open connection.
O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
repeated.measures.columns is a vector. Consider:
repeated.measures.columns <- c("3,6,10", "5,4,9")
lst <- strsplit(repeated.measures.columns, split = ",")
lapply(lst, as.numeric)
which is why strsplit() returns a list - one list component for each
repeated.measures.columns element. Just pick off the one you want with
[[]]:
as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no
On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
[1] 3 6 10
Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in
as.numeric(strsplit(repeated.measures.columns, ",")[[1]])
Also,
eval(parse(text=paste("c(", repeated.measures.columns, ")")))
looks competitive, and is quite a bit more general (e.g. allows spaces,
works with complex numbers), or you can use scan() from an anonymous file
or a textConnection.
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
On Sat, 2006-08-19 at 13:30 +0100, Prof Brian Ripley wrote:
On Sat, 19 Aug 2006, Marc Schwartz wrote:
On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
[1] 3 6 10
Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in
as.numeric(strsplit(repeated.measures.columns, ",")[[1]])
Also,
eval(parse(text=paste("c(", repeated.measures.columns, ")")))
looks competitive, and is quite a bit more general (e.g. allows spaces,
works with complex numbers), or you can use scan() from an anonymous file
or a textConnection.
I would say more than competitive:
repeated.measures.columns <- paste(1:100000, collapse = ",")
Much gratitude to Professor Ripley, Peter Dalgaard, Marc Schwartz, and Roger
Bivand.
__________________
Roger Bivand wrote that ... strsplit() returns a list - one list component
for each repeated.measures.columns element. Just pick off the one you want
with
[[]]:
as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])
which had stumped me, since that syntax fails without the [[1]]
specification.
__________________
Peter Dalgaard, who also suggested the [[1]] specification, pointed out that
scan(textConnection(x), sep=",")
will work, although that leaves you with a dangling open connection.
__________________
Marc Schwartz advised to ...
Use unlist() to take the output of strsplit() and convert it to a vector,
before coercing to numeric.
as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
____________________________________
Brian D. Ripley suggested that the following looks competitive, and is quite
a bit more general (e.g. allows spaces, works with complex numbers)
eval(parse(text=paste("c(", repeated.measures.columns, ")")))
and Marc Schwartz showed that Professor Ripley's suggestion is much faster
than the competition with some system.time trials.
____________________________________
Many thanks to all.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax: 614-455-3265
http://www.StatisticalEngineering.com
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Charles Annis, P.E.
Sent: Saturday, August 19, 2006 7:59 AM
To: r-help at stat.math.ethz.ch
Subject: [R] string-to-number
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
<Charles.Annis at statisticalengineering.com> wrote:
Much gratitude to Professor Ripley, Peter Dalgaard, Marc Schwartz, and Roger
Bivand.
__________________
Roger Bivand wrote that ... strsplit() returns a list - one list component
for each repeated.measures.columns element. Just pick off the one you want
with
[[]]:
as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])
which had stumped me, since that syntax fails without the [[1]]
specification.
__________________
Peter Dalgaard, who also suggested the [[1]] specification, pointed out that
scan(textConnection(x), sep=",")
will work, although that leaves you with a dangling open connection.
You do this:
scan(textConnection(x), sep = ",")
closeAllConnections()
Now the following shows that none are open:
showConnections()
You could alternately explicitly close it:
scan(con <- textConnection(x), sep = ",")
close(con)
__________________
Marc Schwartz advised to ...
Use unlist() to take the output of strsplit() and convert it to a vector,
before coercing to numeric.
as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
____________________________________
Brian D. Ripley suggested that the following looks competitive, and is quite
a bit more general (e.g. allows spaces, works with complex numbers)
eval(parse(text=paste("c(", repeated.measures.columns, ")")))
and Marc Schwartz showed that Professor Ripley's suggestion is much faster
than the competition with some system.time trials.
____________________________________
Many thanks to all.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax: 614-455-3265
http://www.StatisticalEngineering.com
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Charles Annis, P.E.
Sent: Saturday, August 19, 2006 7:59 AM
To: r-help at stat.math.ethz.ch
Subject: [R] string-to-number
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
Charles Annis, P.E.
Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax: 614-455-3265
http://www.StatisticalEngineering.com
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Wow. New respect for parse/eval.
Do you think this is a special case of a more general principle? I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.
Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.
Many thanks for your efforts, Marc!
Regards,
Mike
On 8/19/06, Marc Schwartz <MSchwartz at mn.rr.com> wrote:
On Sat, 2006-08-19 at 13:30 +0100, Prof Brian Ripley wrote:
On Sat, 19 Aug 2006, Marc Schwartz wrote:
On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:
I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"
into c(3,6,10) for subsequent use.
as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion
I've tried many things including
strsplit(repeated.measures.columns, split = ",")
which produces a list with only one element, viz:
[[1]]
[1] "3" "6" "10"
as.numeric() doesn't like that either.
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.
Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
Thank you.
[1] 3 6 10
Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in
as.numeric(strsplit(repeated.measures.columns, ",")[[1]])
Also,
eval(parse(text=paste("c(", repeated.measures.columns, ")")))
looks competitive, and is quite a bit more general (e.g. allows spaces,
works with complex numbers), or you can use scan() from an anonymous file
or a textConnection.
I would say more than competitive:
repeated.measures.columns <- paste(1:100000, collapse = ",")
On Sat, 2006-08-19 at 10:25 -0600, Mike Nielsen wrote:
Wow. New respect for parse/eval.
Do you think this is a special case of a more general principle? I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.
Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.
Many thanks for your efforts, Marc!
Mike,
I think that one needs to consider where the time is being spent and
then adjust accordingly. Once you understand that, you can develop some
insight into what may be a more efficient approach. R provides good
profiling tools that facilitate this process.
In this case, almost all of the time in the first two examples using
strsplit(), is in that function:
$by.self
self.time self.pct total.time total.pct
"parse" 0.42 87.5 0.42 87.5
"eval" 0.06 12.5 0.48 100.0
$by.total
total.time total.pct self.time self.pct
"eval" 0.48 100.0 0.06 12.5
"parse" 0.42 87.5 0.42 87.5
$sampling.time
[1] 0.48
To some extent, one could argue that my initial timing examples are
contrived, in that they specifically demonstrate a worst case scenario
using strsplit(). Real world examples may or may not show such gains.
For example with Charles' initial query, the initial vector was rather
short:
> repeated.measures.columns
[1] "3,6,10"
So if this was a one-time conversion, we would not see such significant
gains.
However, what if we had a long list of shorter entries:
[1] TRUE
We do see a notable reduction in time with strsplit(), while a notable
increase in time using eval(parse)), even though we are converting the
same net number of values (100,000).
Much of the increase with eval(parse()) is of course due to the overhead
of sapply() and navigating the list.
Let's increase the size of the list components to 1000:
eval(parse(text=paste("c(", x, ")"))))))
[1] 15.893 0.928 18.139 0.000 0.000
So we see here that as the size of the list components increases, there
continues to be an advantage to Prof. Ripley's approach over using
strsplit().
Again, one needs to develop an understanding of where the time is spent
in the processing by profiling and then consider how to introduce
efficiencies, which in some cases may very well require the use of
compiled C/FORTRAN as may be appropriate if times become too long.
HTH,
Marc Schwartz
Marc,
Thanks very much for this. I hadn't really looked at Rprof in the
past; now I have a new toy to play with!
I have formulated an hypothesis that the reason parse/eval is quicker
lies in the pattern-matching code: strsplit is using regular
expressions, whereas perhaps parse is using some more clever (but
possibly less general) matching algorithm. It will be interesting to
inspect the source code to get to the bottom of it.
Thanks again for your interest and efforts in this, and for pointing out Rprof!
Regards,
Mike Nielsen
On 8/20/06, Marc Schwartz <MSchwartz at mn.rr.com> wrote:
On Sat, 2006-08-19 at 10:25 -0600, Mike Nielsen wrote:
Wow. New respect for parse/eval.
Do you think this is a special case of a more general principle? I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.
Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.
Many thanks for your efforts, Marc!
Mike,
I think that one needs to consider where the time is being spent and
then adjust accordingly. Once you understand that, you can develop some
insight into what may be a more efficient approach. R provides good
profiling tools that facilitate this process.
In this case, almost all of the time in the first two examples using
strsplit(), is in that function:
$by.self
self.time self.pct total.time total.pct
"parse" 0.42 87.5 0.42 87.5
"eval" 0.06 12.5 0.48 100.0
$by.total
total.time total.pct self.time self.pct
"eval" 0.48 100.0 0.06 12.5
"parse" 0.42 87.5 0.42 87.5
$sampling.time
[1] 0.48
To some extent, one could argue that my initial timing examples are
contrived, in that they specifically demonstrate a worst case scenario
using strsplit(). Real world examples may or may not show such gains.
For example with Charles' initial query, the initial vector was rather
short:
> repeated.measures.columns
[1] "3,6,10"
So if this was a one-time conversion, we would not see such significant
gains.
However, what if we had a long list of shorter entries:
[1] TRUE
We do see a notable reduction in time with strsplit(), while a notable
increase in time using eval(parse)), even though we are converting the
same net number of values (100,000).
Much of the increase with eval(parse()) is of course due to the overhead
of sapply() and navigating the list.
Let's increase the size of the list components to 1000:
eval(parse(text=paste("c(", x, ")"))))))
[1] 15.893 0.928 18.139 0.000 0.000
So we see here that as the size of the list components increases, there
continues to be an advantage to Prof. Ripley's approach over using
strsplit().
Again, one needs to develop an understanding of where the time is spent
in the processing by profiling and then consider how to introduce
efficiencies, which in some cases may very well require the use of
compiled C/FORTRAN as may be appropriate if times become too long.
HTH,
Marc Schwartz