An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20140219/a4dee801/attachment.pl>
Generalizing a regex for retrieving numbers with and without scientific notation
2 messages · Morway, Eric, Marc Schwartz
On Feb 19, 2014, at 12:26 PM, Morway, Eric <emorway at usgs.gov> wrote:
I'm trying to extract all of the values from edm in the example below.
However, the first attempt only retrieves the final number in the sequence
since it is recorded using scientific notation. The second attempt
retrieves all of the numbers, but omits the scientific notation component
of the final number. How can I make the regular expression more general
such that I get every value AND its corresponding "E"-value (i.e.,
"...E-06"), where pertinent? I've spent time reading through ?regex, but
my attempts to use the "*" character, where the preceding item will be
matched zero or more times, have so far proven syntactically incorrect or
generally unsuccessful. .Appreciate the help, Eric
edm <-
c("","param_value","6.301343","6.366305","6.431268","6.496230","6.561192","6.626155","9.091117E-06")
param_values <- strapply(edm,"\\d+\\.\\d+E[-+]?\\d+", as.numeric,
simplify=cbind)
param_values
#[1,] 9.091117e-06
param_values <- strapply(edm,"\\d+\\.\\d+", as.numeric, simplify=cbind)
param_values
#[1,] 6.301343 6.366305 6.431268 6.49623 6.561192 6.626155 9.091117
If the individual elements of the vector are either numeric or non-numeric, why not just use:
as.numeric(edm)
[1] NA NA 6.301343e+00 6.366305e+00 6.431268e+00 [6] 6.496230e+00 6.561192e+00 6.626155e+00 9.091117e-06 Warning message: NAs introduced by coercion The non-numeric elements are returned as NA's, which you can remove by using ?na.omit. The only reason to use a regex would be if the individual elements themselves contained both numeric and non-numeric characters. If you then want to explicitly format numeric output (which would yield a character vector), you can use ?sprintf or ?format. Keep in mind the difference between how R *PRINTS* a numeric value and how R *STORES* a numeric value internally. Regards, Marc Schwartz