Skip to content

extract all numbers from a string

6 messages · arun, Duncan Mackay, Gabor Grothendieck +1 more

#
Hi all,

I have been beating my head against this problem for a bit, 
but I can't figure it out.

I have a series of strings of variable length, and each will 
have one or more numbers, of varying format.  E.g., I might 
have:


tmpstr = "The first number is: 32.  Another one is: 32.1. 
Here's a number in scientific format, 0.3523e10, and 
another, 0.3523e-10, and a negative, -313.1"

How could I get R to just give me a list of numerics 
containing the numbers therein?

Thanks very much to the regexp wizards!

Cheers,
Nick
#
HI,
One way would be:

library(stringr)
tmpstr = "The first number is: 32.? Another one is: 32.1.
Here's a number in scientific format, 0.3523e10, and
another, 0.3523e-10, and a negative, -313.1"
pattern<- "(\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)"
str_extract_all(tmpstr,pattern)[[1]]
#[1] "32"???????? "32.1"?????? "0.3523e10"? "0.3523e-10" "-313.1"??? 
?as.numeric(str_extract_all(tmpstr,pattern)[[1]])
A.K.



----- Original Message -----
From: Nick Matzke <matzke at berkeley.edu>
To: R-help at r-project.org
Cc: 
Sent: Sunday, June 16, 2013 1:06 AM
Subject: [R] extract all numbers from a string

Hi all,

I have been beating my head against this problem for a bit, 
but I can't figure it out.

I have a series of strings of variable length, and each will 
have one or more numbers, of varying format.? E.g., I might 
have:


tmpstr = "The first number is: 32.? Another one is: 32.1. 
Here's a number in scientific format, 0.3523e10, and 
another, 0.3523e-10, and a negative, -313.1"

How could I get R to just give me a list of numerics 
containing the numbers therein?

Thanks very much to the regexp wizards!

Cheers,
Nick
#
Nick

try

as.numeric(
  strsplit(gsub("[[:alpha:][:punct:][:space:]]{2,}",",",tmpstr),",")[[1]][-1]
)

see ?regexpr for information

HTH

Duncan


Duncan Mackay
Department of Agronomy and Soil Science
University of New England
Armidale NSW 2351
Email: home: mackay at northnet.com.au
At 15:06 16/06/2013, you wrote:
#
Thanks *VERY* much, this is great!

I realized a few more cases, I think I've got something that 
covers all the possibilities now:


library(stringr)
tmpstr = "The first number is: 32.  Another one is: 32.1. 
Here's a number in scientific format, 0.3523e10, and 
another, 0.3523e-10, and a negative, -313.1"

patternslist = NULL
p=0
patternslist[[(p=p+1)]] = "(\\d+)"				# positive integer
patternslist[[(p=p+1)]] = "(-\\d+)"				# negative integer
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)"		# positive float
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)"	# positive 
float, scientific w. positive power
patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # positive 
float, scientific w. negative power
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)"		# negative float
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)"	# negative 
float, scientific w. positive power
patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# negative 
float, scientific w. negative power

patternslist[[(p=p+1)]] = "(\\d+e\\d+)"			# positive int, 
scientific w. positive power
patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" 		# positive int, 
scientific w. negative power
patternslist[[(p=p+1)]] = "(-\\d+e\\d+)"		# negative int, 
scientific w. positive power
patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)"		# negative int, 
scientific w. negative power

pattern = paste(patternslist, collapse="|", sep="")
pattern
as.numeric(str_extract_all(tmpstr,pattern)[[1]])

# A more complex string
tmpstr = "The first number is: 32.  342 342.1   -3234e-10 
3234e-1 Another one is: 32.1. Here's a number in scientific 
format, 0.3523e10, and another, 0.3523e-10, and a negative, 
-313.1"
#pattern = 
"(\\d)+|(-\\d)+|(\\d+\\.\\d+)|(-\\d+\\.\\d+)|(\\d+.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)"
as.numeric(str_extract_all(tmpstr,pattern)[[1]])



Cheers!
Nick


PS: A function version:


# Extract numbers / get numbers / get all numbers from a 
text string
getnums <- function(tmpstr)
	{
	# Example string
	# tmpstr = "The first number is: 32.  342 342.1   -3234e-10 
  3234e-1 Another one is: 32.1. Here's a number in 
scientific format, 0.3523e10, and another, 0.3523e-10, and a 
negative, -313.1"
	
	library(stringr)
	
# 	patternslist = NULL
# 	p=0
# 	patternslist[[(p=p+1)]] = "(\\d+)"				# positive integer
# 	patternslist[[(p=p+1)]] = "(-\\d+)"				# negative integer
# 	patternslist[[(p=p+1)]] = "(\\d+\\.\\d+)"		# positive float
# 	patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e\\d+)"	# positive 
float, scientific w. positive power
# 	patternslist[[(p=p+1)]] = "(\\d+\\.\\d+e-\\d+)" # 
positive float, scientific w. negative power
# 	patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+)"		# negative float
# 	patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e\\d+)"	# 
negative float, scientific w. positive power
# 	patternslist[[(p=p+1)]] = "(-\\d+\\.\\d+e-\\d+)"# 
negative float, scientific w. negative power
# 	
# 	patternslist[[(p=p+1)]] = "(\\d+e\\d+)"			# positive int, 
scientific w. positive power
# 	patternslist[[(p=p+1)]] = "(\\d+e-\\d+)" 		# positive 
int, scientific w. negative power
# 	patternslist[[(p=p+1)]] = "(-\\d+e\\d+)"		# negative int, 
scientific w. positive power
# 	patternslist[[(p=p+1)]] = "(-\\d+e-\\d+)"		# negative 
int, scientific w. negative power
# 	
# 	pattern = paste(patternslist, collapse="|", sep="")

	# set up the pattern
	pattern = 
"(\\d+)|(-\\d+)|(\\d+\\.\\d+)|(\\d+\\.\\d+e\\d+)|(\\d+\\.\\d+e-\\d+)|(-\\d+\\.\\d+)|(-\\d+\\.\\d+e\\d+)|(-\\d+\\.\\d+e-\\d+)|(\\d+e\\d+)|(\\d+e-\\d+)|(-\\d+e\\d+)|(-\\d+e-\\d+)"
	
	# Get the numbers
	nums_from_tmpstr = 
as.numeric(str_extract_all(tmpstr,pattern)[[1]])

	# Return them
	return(nums_from_tmpstr)
	}
On 6/15/13 10:46 PM, arun wrote:

  
    
#
On Sun, Jun 16, 2013 at 9:00 PM, Nick Matzke <matzke at berkeley.edu> wrote:
This much simpler single pattern may be good enough:
[1] "32"         "342"        "342.1"      "-3234e-10"  "3234e-1"
[6] "32.1"       "0.3523e10"  "0.3523e-10" "-313.1"
[1]  3.200e+01  3.420e+02  3.421e+02 -3.234e-07  3.234e+02  3.210e+01  3.523e+09
[8]  3.523e-11 -3.131e+02

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
#
Ooh, nice! Thanks!
Nick
On 6/16/13 8:42 PM, Gabor Grothendieck wrote: