string-to-number

Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?
Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.
One more step:
as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
[1]  3  6 10

Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.

HTH,

Marc Schwartz
"Charles Annis, P.E." <Charles.Annis at statisticalengineering.com> writes:
Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?
3) you're almost there, just not realizing it:
x <- "3,6,10"
as.numeric(strsplit(x,split = ",")[[1]])
[1]  3  6 10

or for that matter
scan(textConnection(x), sep=",")
Read 3 items
[1]  3  6 10

although that leaves you with a dangling open connection.
O__  ---- Peter Dalgaard             ?ster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907

Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.
repeated.measures.columns is a vector. Consider:

repeated.measures.columns <- c("3,6,10", "5,4,9")
lst <- strsplit(repeated.measures.columns, split = ",")
lapply(lst, as.numeric)

which is why strsplit() returns a list - one list component for each 
repeated.measures.columns element. Just pick off the one you want with 
[[]]:

as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])
Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.
One more step:

as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
[1]  3  6 10

Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in

as.numeric(strsplit(repeated.measures.columns, ",")[[1]])

Also,

eval(parse(text=paste("c(", repeated.measures.columns, ")")))

looks competitive, and is quite a bit more general (e.g. allows spaces, 
works with complex numbers), or you can use scan() from an anonymous file 
or a textConnection.
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
On Sat, 19 Aug 2006, Marc Schwartz wrote:

On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.
One more step:

as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
[1]  3  6 10

Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in

as.numeric(strsplit(repeated.measures.columns, ",")[[1]])

Also,

eval(parse(text=paste("c(", repeated.measures.columns, ")")))

looks competitive, and is quite a bit more general (e.g. allows spaces, 
works with complex numbers), or you can use scan() from an anonymous file 
or a textConnection.
I would say more than competitive:

  repeated.measures.columns <- paste(1:100000, collapse = ",")
str(repeated.measures.columns)
chr
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,4"| __truncated__
system.time(res1 <-
as.numeric(unlist(strsplit(repeated.measures.columns, ","))))
[1] 24.238  0.192 26.200  0.000  0.000
system.time(res2 <- as.numeric(strsplit(repeated.measures.columns,
",")[[1]]))
[1] 24.313  0.196 26.471  0.000  0.000
system.time(res3 <- eval(parse(text=paste("c(",
repeated.measures.columns, ")"))))
[1] 0.328 0.004 0.395 0.000 0.000
str(res1)
num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...
str(res2)
num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...
str(res3)
num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...
all(res1 == res2)
[1] TRUE
all(res1 == res3)
[1] TRUE

Best regards,

Marc
Much gratitude to Professor Ripley, Peter Dalgaard, Marc Schwartz, and Roger
Bivand. 
__________________

Roger Bivand wrote that ... strsplit() returns a list - one list component
for each repeated.measures.columns element. Just pick off the one you want
with
[[]]:
as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])

which had stumped me, since that syntax fails without the [[1]]
specification.
__________________
Peter Dalgaard, who also suggested the [[1]] specification, pointed out that

scan(textConnection(x), sep=",")

will work, although that leaves you with a dangling open connection.
__________________
Marc Schwartz advised to ...
Use unlist() to take the output of strsplit() and convert it to a vector,
before coercing to numeric.

as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
____________________________________
Brian D. Ripley suggested that the following looks competitive, and is quite
a bit more general (e.g. allows spaces, works with complex numbers)

eval(parse(text=paste("c(", repeated.measures.columns, ")")))

and Marc Schwartz showed that Professor Ripley's suggestion is much faster
than the competition with some system.time trials.
____________________________________

Many thanks to all.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Charles Annis, P.E.
Sent: Saturday, August 19, 2006 7:59 AM
To: r-help at stat.math.ethz.ch
Subject: [R] string-to-number

Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including 
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:? 614-455-3265
http://www.StatisticalEngineering.com
?

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
On 8/19/06, Charles Annis, P.E.
Much gratitude to Professor Ripley, Peter Dalgaard, Marc Schwartz, and Roger
Bivand.
__________________

Roger Bivand wrote that ... strsplit() returns a list - one list component
for each repeated.measures.columns element. Just pick off the one you want
with
[[]]:
as.numeric(strsplit(repeated.measures.columns, split = ",")[[1]])

which had stumped me, since that syntax fails without the [[1]]
specification.
__________________
Peter Dalgaard, who also suggested the [[1]] specification, pointed out that

scan(textConnection(x), sep=",")

will work, although that leaves you with a dangling open connection.
You do this:

scan(textConnection(x), sep = ",")
closeAllConnections()

Now the following shows that none are open:

showConnections()

You could alternately explicitly close it:

scan(con <- textConnection(x), sep = ",")
close(con)
__________________
Marc Schwartz advised to ...
Use unlist() to take the output of strsplit() and convert it to a vector,
before coercing to numeric.

as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
____________________________________
Brian D. Ripley suggested that the following looks competitive, and is quite
a bit more general (e.g. allows spaces, works with complex numbers)

eval(parse(text=paste("c(", repeated.measures.columns, ")")))

and Marc Schwartz showed that Professor Ripley's suggestion is much faster
than the competition with some system.time trials.
____________________________________

Many thanks to all.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Charles Annis, P.E.
Sent: Saturday, August 19, 2006 7:59 AM
To: r-help at stat.math.ethz.ch
Subject: [R] string-to-number

Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax: 614-455-3265
http://www.StatisticalEngineering.com

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Wow.  New respect for parse/eval.

Do you think this is a special case of a more general principle?  I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.

Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.

Many thanks for your efforts, Marc!

Regards,

Mike
On Sat, 2006-08-19 at 13:30 +0100, Prof Brian Ripley wrote:
On Sat, 19 Aug 2006, Marc Schwartz wrote:

On Sat, 2006-08-19 at 07:58 -0400, Charles Annis, P.E. wrote:
Greetings, Amigos:

I have been trying without success to convert a character string,
repeated.measures.columns
[1] "3,6,10"

into c(3,6,10) for subsequent use.

as.numeric(repeated.measures.columns) doesn't work (likely because of the
commas)
[1] NA
Warning message:
NAs introduced by coercion

I've tried many things including
strsplit(repeated.measures.columns, split = ",")

which produces a list with only one element, viz:
[[1]]
[1] "3"  "6"  "10"

as.numeric() doesn't like that either.

Clearly: 1) I cannot be the first person to attempt this, and 2) I've made
this WAY harder than it is.

Would some kind soul please instruct me (and perhaps subsequent searchers)
how to convert the elements of a string into numbers?

Thank you.
One more step:

as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
[1]  3  6 10

Use unlist() to take the output of strsplit() and convert it to a
vector, before coercing to numeric.
Or, more simply, use [[1]] as in

as.numeric(strsplit(repeated.measures.columns, ",")[[1]])

Also,

eval(parse(text=paste("c(", repeated.measures.columns, ")")))

looks competitive, and is quite a bit more general (e.g. allows spaces,
works with complex numbers), or you can use scan() from an anonymous file
or a textConnection.
I would say more than competitive:

  repeated.measures.columns <- paste(1:100000, collapse = ",")

str(repeated.measures.columns)
 chr
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,4"| __truncated__

system.time(res1 <-
as.numeric(unlist(strsplit(repeated.measures.columns, ","))))
[1] 24.238  0.192 26.200  0.000  0.000

system.time(res2 <- as.numeric(strsplit(repeated.measures.columns,
",")[[1]]))
[1] 24.313  0.196 26.471  0.000  0.000

system.time(res3 <- eval(parse(text=paste("c(",
repeated.measures.columns, ")"))))
[1] 0.328 0.004 0.395 0.000 0.000

str(res1)
 num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...

str(res2)
 num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...

str(res3)
 num [1:100000] 1 2 3 4 5 6 7 8 9 10 ...

all(res1 == res2)
[1] TRUE

all(res1 == res3)
[1] TRUE

Best regards,

Marc

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Regards,

Mike Nielsen
Wow.  New respect for parse/eval.

Do you think this is a special case of a more general principle?  I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.

Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.

Many thanks for your efforts, Marc!
Mike,

I think that one needs to consider where the time is being spent and
then adjust accordingly. Once you understand that, you can develop some
insight into what may be a more efficient approach. R provides good
profiling tools that facilitate this process.

In this case, almost all of the time in the first two examples using
strsplit(), is in that function:
repeated.measures.columns <- paste(1:100000, collapse = ",")
library(utils)
Rprof(tmp <- tempfile())
res1 <- as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
Rprof()
summaryRprof(tmp)
$by.self
                    self.time self.pct total.time total.pct
"strsplit"              23.68     99.7      23.68      99.7
"as.double.default"      0.06      0.3       0.06       0.3
"as.numeric"             0.00      0.0      23.74     100.0
"unlist"                 0.00      0.0      23.68      99.7

$by.total
                    total.time total.pct self.time self.pct
"as.numeric"             23.74     100.0      0.00      0.0
"strsplit"               23.68      99.7     23.68     99.7
"unlist"                 23.68      99.7      0.00      0.0
"as.double.default"       0.06       0.3      0.06      0.3

$sampling.time
[1] 23.74

Contrast that with Prof. Ripley's approach:
Rprof(tmp <- tempfile())
res3 <- eval(parse(text=paste("c(", repeated.measures.columns, ")")))
Rprof()
summaryRprof(tmp)
$by.self
        self.time self.pct total.time total.pct
"parse"      0.42     87.5       0.42      87.5
"eval"       0.06     12.5       0.48     100.0

$by.total
        total.time total.pct self.time self.pct
"eval"        0.48     100.0      0.06     12.5
"parse"       0.42      87.5      0.42     87.5

$sampling.time
[1] 0.48

To some extent, one could argue that my initial timing examples are
contrived, in that they specifically demonstrate a worst case scenario
using strsplit().  Real world examples may or may not show such gains.

For example with Charles' initial query, the initial vector was rather
short:

  > repeated.measures.columns
  [1] "3,6,10"

So if this was a one-time conversion, we would not see such significant
gains.

However, what if we had a long list of shorter entries:
repeated.measures.columns <- paste(1:10, collapse = ",")
repeated.measures.columns
[1] "1,2,3,4,5,6,7,8,9,10"
big.list <- replicate(10000, list(repeated.measures.columns))
head(big.list)
[[1]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[2]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[3]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[4]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[5]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[6]]
[1] "1,2,3,4,5,6,7,8,9,10"
system.time(res1 <- t(sapply(big.list, function(x)
as.numeric(unlist(strsplit(x, ","))))))
[1] 1.972 0.044 2.411 0.000 0.000
str(res1)
num [1:10000, 1:10] 1 1 1 1 1 1 1 1 1 1 ...
head(res1)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]    1    2    3    4    5    6    7    8    9    10
[3,]    1    2    3    4    5    6    7    8    9    10
[4,]    1    2    3    4    5    6    7    8    9    10
[5,]    1    2    3    4    5    6    7    8    9    10
[6,]    1    2    3    4    5    6    7    8    9    10

Now use Prof. Ripley's approach:
system.time(res3 <- t(sapply(big.list, function(x)
eval(parse(text=paste("c(", x, ")"))))))
[1] 1.676 0.012 1.877 0.000 0.000
str(res3)
num [1:10000, 1:10] 1 1 1 1 1 1 1 1 1 1 ...
head(res3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]    1    2    3    4    5    6    7    8    9    10
[3,]    1    2    3    4    5    6    7    8    9    10
[4,]    1    2    3    4    5    6    7    8    9    10
[5,]    1    2    3    4    5    6    7    8    9    10
[6,]    1    2    3    4    5    6    7    8    9    10
all(res1 == res3)
[1] TRUE

We do see a notable reduction in time with strsplit(), while a notable
increase in time using eval(parse)), even though we are converting the
same net number of values (100,000).

Much of the increase with eval(parse()) is of course due to the overhead
of sapply() and navigating the list.

Let's increase the size of the list components to 1000:
repeated.measures.columns <- paste(1:1000, collapse = ",")
big.list <- replicate(10000, list(repeated.measures.columns))
system.time(res1 <- sapply(big.list, function(x)
as.numeric(unlist(strsplit(x, ",")))))
[1] 33.270  0.744 37.163  0.000  0.000
system.time(res3 <- t(sapply(big.list, function(x)
eval(parse(text=paste("c(", x, ")"))))))
[1] 15.893  0.928 18.139  0.000  0.000

So we see here that as the size of the list components increases, there
continues to be an advantage to Prof. Ripley's approach over using
strsplit().

Again, one needs to develop an understanding of where the time is spent
in the processing by profiling and then consider how to introduce
efficiencies, which in some cases may very well require the use of
compiled C/FORTRAN as may be appropriate if times become too long.

HTH,

Marc Schwartz
Marc,

Thanks very much for this.  I hadn't really looked at Rprof in the
past; now I have a new toy to play with!

I have formulated an hypothesis that the reason parse/eval is quicker
lies in the pattern-matching code:  strsplit is using regular
expressions, whereas perhaps parse is using some more clever (but
possibly less general) matching algorithm.  It will be interesting to
inspect the source code to get to the bottom of it.

Thanks again for your interest and efforts in this, and for pointing out Rprof!

Regards,

Mike Nielsen
On Sat, 2006-08-19 at 10:25 -0600, Mike Nielsen wrote:
Wow.  New respect for parse/eval.

Do you think this is a special case of a more general principle?  I
suppose the cost is memory, but from time to time a speedup like this
would be very beneficial.

Any hints about how R programmers could recognize such cases would, I
am sure, be of value to the list in general.

Many thanks for your efforts, Marc!
Mike,

I think that one needs to consider where the time is being spent and
then adjust accordingly. Once you understand that, you can develop some
insight into what may be a more efficient approach. R provides good
profiling tools that facilitate this process.

In this case, almost all of the time in the first two examples using
strsplit(), is in that function:

repeated.measures.columns <- paste(1:100000, collapse = ",")

library(utils)
Rprof(tmp <- tempfile())
res1 <- as.numeric(unlist(strsplit(repeated.measures.columns, ",")))
Rprof()

summaryRprof(tmp)
$by.self
                    self.time self.pct total.time total.pct
"strsplit"              23.68     99.7      23.68      99.7
"as.double.default"      0.06      0.3       0.06       0.3
"as.numeric"             0.00      0.0      23.74     100.0
"unlist"                 0.00      0.0      23.68      99.7

$by.total
                    total.time total.pct self.time self.pct
"as.numeric"             23.74     100.0      0.00      0.0
"strsplit"               23.68      99.7     23.68     99.7
"unlist"                 23.68      99.7      0.00      0.0
"as.double.default"       0.06       0.3      0.06      0.3

$sampling.time
[1] 23.74

Contrast that with Prof. Ripley's approach:

Rprof(tmp <- tempfile())
res3 <- eval(parse(text=paste("c(", repeated.measures.columns, ")")))
Rprof()

summaryRprof(tmp)
$by.self
        self.time self.pct total.time total.pct
"parse"      0.42     87.5       0.42      87.5
"eval"       0.06     12.5       0.48     100.0

$by.total
        total.time total.pct self.time self.pct
"eval"        0.48     100.0      0.06     12.5
"parse"       0.42      87.5      0.42     87.5

$sampling.time
[1] 0.48

To some extent, one could argue that my initial timing examples are
contrived, in that they specifically demonstrate a worst case scenario
using strsplit().  Real world examples may or may not show such gains.

For example with Charles' initial query, the initial vector was rather
short:

  > repeated.measures.columns
  [1] "3,6,10"

So if this was a one-time conversion, we would not see such significant
gains.

However, what if we had a long list of shorter entries:

repeated.measures.columns <- paste(1:10, collapse = ",")
repeated.measures.columns
[1] "1,2,3,4,5,6,7,8,9,10"

big.list <- replicate(10000, list(repeated.measures.columns))

head(big.list)
[[1]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[2]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[3]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[4]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[5]]
[1] "1,2,3,4,5,6,7,8,9,10"

[[6]]
[1] "1,2,3,4,5,6,7,8,9,10"

system.time(res1 <- t(sapply(big.list, function(x)
as.numeric(unlist(strsplit(x, ","))))))
[1] 1.972 0.044 2.411 0.000 0.000

str(res1)
 num [1:10000, 1:10] 1 1 1 1 1 1 1 1 1 1 ...

head(res1)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]    1    2    3    4    5    6    7    8    9    10
[3,]    1    2    3    4    5    6    7    8    9    10
[4,]    1    2    3    4    5    6    7    8    9    10
[5,]    1    2    3    4    5    6    7    8    9    10
[6,]    1    2    3    4    5    6    7    8    9    10

Now use Prof. Ripley's approach:

system.time(res3 <- t(sapply(big.list, function(x)
eval(parse(text=paste("c(", x, ")"))))))
[1] 1.676 0.012 1.877 0.000 0.000

str(res3)
 num [1:10000, 1:10] 1 1 1 1 1 1 1 1 1 1 ...

head(res3)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10
[2,]    1    2    3    4    5    6    7    8    9    10
[3,]    1    2    3    4    5    6    7    8    9    10
[4,]    1    2    3    4    5    6    7    8    9    10
[5,]    1    2    3    4    5    6    7    8    9    10
[6,]    1    2    3    4    5    6    7    8    9    10

all(res1 == res3)
[1] TRUE

We do see a notable reduction in time with strsplit(), while a notable
increase in time using eval(parse)), even though we are converting the
same net number of values (100,000).

Much of the increase with eval(parse()) is of course due to the overhead
of sapply() and navigating the list.

Let's increase the size of the list components to 1000:

repeated.measures.columns <- paste(1:1000, collapse = ",")
big.list <- replicate(10000, list(repeated.measures.columns))

system.time(res1 <- sapply(big.list, function(x)
as.numeric(unlist(strsplit(x, ",")))))
[1] 33.270  0.744 37.163  0.000  0.000

system.time(res3 <- t(sapply(big.list, function(x)
eval(parse(text=paste("c(", x, ")"))))))
[1] 15.893  0.928 18.139  0.000  0.000

So we see here that as the size of the list components increases, there
continues to be an advantage to Prof. Ripley's approach over using
strsplit().

Again, one needs to develop an understanding of where the time is spent
in the processing by profiling and then consider how to introduce
efficiencies, which in some cases may very well require the use of
compiled C/FORTRAN as may be appropriate if times become too long.

HTH,

Marc Schwartz

Regards,

Mike Nielsen