extract all numbers from a string
Ooh, nice! Thanks! Nick
On 6/16/13 8:42 PM, Gabor Grothendieck wrote:
On Sun, Jun 16, 2013 at 9:00 PM, Nick Matzke <matzke at berkeley.edu> wrote:
Thanks *VERY* much, this is great! I realized a few more cases, I think I've got something that covers all the possibilities now: library(stringr) tmpstr = "The first number is: 32. Another one is: 32.1. Here's a number in scientific format, 0.3523e10, and another, 0.3523e-10, and a negative, -313.1" patternslist = NULL p=0 patternslist[[(p=p+1)]] = "(\\d+)" # positive integer patternslist[[(p=p+1)]] = "(-\\d+)" # negative integer patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)" # positive float patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)" # positive float, scientific w. positive power patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # positive float, scientific w. negative power patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)" # negative float patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)" # negative float, scientific w. positive power patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# negative float, scientific w. negative power patternslist[[(p=p+1)]] = "(\\d+e\\d+)" # positive int, scientific w. positive power patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" # positive int, scientific w. negative power patternslist[[(p=p+1)]] = "(-\\d+e\\d+)" # negative int, scientific w. positive power patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)" # negative int, scientific w. negative power pattern = paste(patternslist, collapse="|", sep="") pattern as.numeric(str_extract_all(tmpstr,pattern)[[1]]) # A more complex string tmpstr = "The first number is: 32. 342 342.1 -3234e-10 3234e-1 Another one is: 32.1. Here's a number in scientific format, 0.3523e10, and another, 0.3523e-10, and a negative, -313.1" #pattern = "(\\d)+|(-\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)" as.numeric(str_extract_all(tmpstr,pattern)[[1]])
This much simpler single pattern may be good enough:
library(gsubfn) pat <- "[-+.e0-9]*\\d" strapplyc(tmpstr, pat)[[1]]
[1] "32" "342" "342.1" "-3234e-10" "3234e-1" [6] "32.1" "0.3523e10" "0.3523e-10" "-313.1"
strapply(tmpstr, pat, as.numeric)[[1]]
[1] 3.200e+01 3.420e+02 3.421e+02 -3.234e-07 3.234e+02 3.210e+01 3.523e+09 [8] 3.523e-11 -3.131e+02 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
==================================================== Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200B Principles of Phylogenetics: Ecology and Evolution http://ib.berkeley.edu/courses/ib200b/ http://phylo.wikidot.com/ Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Mailing address: Department of Integrative Biology 1005 Valley Life Sciences Building #3140 Berkeley, CA 94720-3140 ----------------------------------------------------- "[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm