Skip to content
Prev 24695 / 398502 Next

subsetting character vector into groups of numerics

Patrick Connolly <p.connolly at hortresearch.co.nz> writes:
Hmm... You seem to be telling us what the format is not. If you want
us to come up with something for the machine to do, it's not too
useful that things are "obvious to the eye"! 

If the format is consistently like the above with subgroups in (),
then you could start with using some of the deeper magic of gsub() to
turn the format into something which would be easier to split into
individual vectors, e.g.
[1] "12 78 23 9 76 43 2 15 41 81 92 5/92 12 /81 78 5 76 9 41 /23 2 15 43"

[What was that? Well, "(" is a special grouping operator in regular
expressions; it isn't part of the RE as such, but things inside (..)
can be referred to with backreferences like \1, which of course needs
to be entered as "\\1". \( is an actual left parenthesis, again
written with the doubled backslash. [^)]* is a sequence consisting of
any character except left parentheses (which is not a grouping
operator when it sits within square brackets). So we're finding the
bits of text delimited by ( and ) and replacing them with a / and the
content of the parentheses. Got it? Don't worry if you don't, I didn't
get it right till the 12th try either! The important thing is knowing
that this kind of stuff is possible if you stare at it long enough.]

Now that it is in an easier format we can use strsplit to get
individual parts:
[[1]]
[1] "12 78 23 9 76 43 2 15 41 81 92 5" "92 12 "                          
[3] "81 78 5 76 9 41 "                 "23 2 15 43"                      

and once we have those we might use scan() on each string to get the
numbers. This requires the use of a text connection, like this
Read 12 items
Read 2 items
Read 6 items
Read 4 items
[[1]]
 [1] 12 78 23  9 76 43  2 15 41 81 92  5

[[2]]
[1] 92 12

[[3]]
[1] 81 78  5 76  9 41

[[4]]
[1] 23  2 15 43

...

Your turn!