Skip to content

Differenciate numbers from reference for rows

6 messages · M.Ribeiro, Dennis Murphy, David Winsemius +1 more

#
So, I am having a tricky reference file to extract information from.

The format of the file is

x   1 + 4 * 3 + 5 + 6 + 11 * 0.5

So, the elements that are not being multiplied (1, 5 and 6) and the elements
before the multiplication sign (4 and 11) means actually the reference for
the row in a matrix where I need to extract the element from.

The numbers after the multiplication sign are regular numbers 
Ex:
[,1]
 [1,]   20
 [2,]   21
 [3,]   22
 [4,]   23
 [5,]   24
 [6,]   25
 [7,]   26
 [8,]   27
 [9,]   28
[10,]   29
[11,]   30
[12,]   31
[13,]   32
[14,]   33
[15,]   34
[16,]   35

I would like to read the rows 1,4,5,6 and 11 and sum then. However the
numbers in the elements row 4 and 11 are multiplied by 3 and 0.5

So it would be
20 + 23 * 3 + 24 + 25 + 30 * 0.5.

And I have this format in different files so I can't do all by hand.
Can anybody help me with a script that can differentiate this?
Thanks
#
On Oct 29, 2010, at 11:16 PM, Dennis Murphy wrote:

            
I saw the beginning of this task as parsing to extract the digits from  
a character string (possibly decimal digits in the case of the third  
and seventh positions) delimited by <space>+<space> and <space>*<space>:

library(gsubfn)
 > x <-  "1 + 4 * 3 + 5 + 6 + 11 * 0.5"

  xin <- readLines(textConnection(x))
  xp <- strapply(xin, "^(\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*) \\+ (\\d 
+) \\+ (\\d+) \\+ (\\d+) \\* (\\d+\\.*\\d*)", c)
  sapply(xp, as.numeric)
      [,1]
[1,]  1.0
[2,]  4.0
[3,]  3.0
[4,]  5.0
[5,]  6.0
[6,] 11.0
[7,]  0.5
#
On Fri, Oct 29, 2010 at 6:54 PM, M.Ribeiro <mresendeufv at yahoo.com.br> wrote:
I assume that every number except for the second number in the pattern
number * number is to be replaced by that row number in x.  Try this.
We define a regular expression which matches the first number ([0-9]+)
of each potential pair and optionally (?) spaces ( *) a star (\\*),
more spaces ( *) and digits [0-9.]+ passing the first and second
backreferences (matches to the parenthesized portions of the regular
expression) to f and inserting the output of f where the matches had
been.

library(gsubfn)
f <- function(a, b) paste(x[as.numeric(a)], b)
s2 <- gsubfn("([0-9]+)( *\\* *[0-9.]+)?", f, s)

If the objective is to then perform the calculation that that
represents then try this:
sapply(s2, function(x) eval(parse(text = x)))

For example,
[1] "20  + 23  * 3 + 24  + 25  + 30  * 0.5" "20  + 23  * 3 + 24  + 25
+ 30  * 0.5"
20  + 23  * 3 + 24  + 25  + 30  * 0.5 20  + 23  * 3 + 24  + 25  + 30  * 0.5
                                  153                                   153

For more see the gsubfn home page at http://gsubfn.googlecode.com
#
On Oct 30, 2010, at 8:42 AM, Gabor Grothendieck wrote:

            
I am scratching my head regarding the gsubfn workings. It appears that  
as gsubfn moves across the input strings that it will either match  
just "[0-9+]" or it will match "[0-9+] *\\* *[0-9.]+?".

In either case the match will do a lookup in x[] for the first match  
using the "a" index, and if there is a match for the second position  
assigned to "*b" then that x[a] will be followed by "*b"  and is  
therefore destined to be multiplied by "b". I cannot quite figure out  
how the NULL value gets not-matched to the second back-reference and  
then doesn't screw up the f() function by only providing one argument  
to a two argument function. Maybe it's due to this? (So can you  
comment on how optional back-references return values?)

 > paste("a", NULL)
[1] "a "

Furthermore, somehow (and this is further functiona magic I am  
missing) these results are concatenated in a string, and then  
evaluated, a step which I do get.
#
On Sat, Oct 30, 2010 at 9:43 AM, David Winsemius <dwinsemius at comcast.net> wrote:
In the regular expression

   "([0-9]+)( *\\* *[0-9.]+)?"

it matches the first (...) and then the (...)?  part.  ? means 0 or 1
occurrences so it can match by matching the content or if that is not
possible it will match the empty string.
(...)? says to match 0 or 1 occurrences of the ... expression.  Iif
(...) does not match then (...)? will be successful in matching the
empty string.  The function is always called with two arguments.  Try
this:
[1] "<a='1'><b=''> + <a='4'><b=' * 3'> + <a='5'><b=''> + <a='6'><b=''>
+ <a='11'><b=' * 0.5'>"