Skip to content

help wrapping findInterval into a function

10 messages · Steve E., David Winsemius, William Dunlap +1 more

#
Dear R Community,

I hope you might be able to assist with a small problem creating a function. 
I am working with water-quality data sets that contain the concentration of
many different elements in water samples.  I need to assign quality-control
flags to values that fall into various concentration ranges.  Rather than a
web of nested if statements, I am employing the findInterval function to
identify values that need to be flagged and to assign the appropriate flag. 
The data consist of a sample identifier, the analysis, and corresponding
value.  The findInterval function works well; however, I would like to
incorporate it into a function so that I can run multiple findInterval
functions for many different water-quality analyses (and I have to do this
for many dataset) but it seems to fall apart when incorporated into a
function.

Run straighforward, the findInterval function works as desired, e.g. below,
creating the new CalciumFlag column with the appropriate flag for, in this
case, levels of calcium in the water:

WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))

However, it does not worked when wrapped in a function (no error messages
are thrown, it simply does not seem to do anything):

WQfunc <- function() {
	WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))
	}

Calling the function WQfunc() does not produce an error but also does not
produce the expected CalciumFlag, it seems to not do anything.

Ultimately, what I need to get to is something like below where multiple
findInterval functions for different analyses are included in a single
function, then I can concatenate the results into a single column containing
all flags for all analyses, e.g.:

WQfunc <- function() {
	WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))

	WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050, 0.125, 125,
Inf))]),""))

	WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis == "Magnesium", (flags
<- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065, 0.15, 75,
Inf))]),""))
	
.....etc for additional water-quality analyses...

	}

As an aside, I started working with the findInterval tool from an example
that I found online but am not clear as to how the multi-component
configuration incorporating brackets actually works, can anyone suggest a
good resource that explains this?


I thank you very much for any assistance you may be able to provide.


Regards,
Steve

--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4165464.html
Sent from the R help mailing list archive at Nabble.com.
#
On Dec 6, 2011, at 11:43 AM, Steve E. wrote:

            
It's probably your use of "with" rather than findInterval. The 'with'  
function sets up an environment and is used mostly in interactive  
session. If you have not passed a dataframe into the function then you  
should use the name of the dataframe and '['.
David Winsemius, MD
West Hartford, CT
#
No, the problem is not with "with" but is
that the OP's did not return the modified
data.frame.  He didn't show how the function
was called, but I suspect the usage was like
  f0 <- function() globalDataFrame$newCol <- ...
  f0()
where it should have been
  f1 <- function(dataFrame) {
    dataFrame$newCol <- ...
    dataFrame # return modified dataFrame
  }
  globalDataFrame <- f1(globalDataFrame)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Bill (and David),

Thank you very much for taking the time to respond to my query.

You were right, I was creating and calling the function exactly as you had
predicted.  I revised the structure based on your suggestion.  It runs but
the output is an array of the flags that are not attached to the data frame,
not a new column in the data frame as was my intention.

So, the new configuration I tried was like this (where DataFrame is not a
real data frame but just the word "DataFrame"):

WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <- with(DataFrame,
ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A")
[findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),""))
}

I called it using:

WaterQualityData <- WQFlags(WaterQualityData)

Again, the output is simply an array of the flags, unattached to a data
frame.  Can you suggest a way to modify this to make it work as desired, or,
in the worst case, can I attach the resulting array of flag values?


Thank you again!

--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4166826.html
Sent from the R help mailing list archive at Nabble.com.
#
On Dec 6, 2011, at 5:53 PM, Steve E. wrote:

            
Unless you provide either the original data or an unambiguous ( at the  
level the R interpreter would see, not at the level of what you see  
when you print a dataframe) description of your data you will get at  
the very best educated guesses. Use str() or dput().
Do you mean "attach" in the sense of using the R function `attach`? If  
so, then please do not. (And please ignore any advice or the examples  
concerning that issue you might get from reading Crawley's text.)


--

Good night.

David Winsemius, MD
West Hartford, CT
#
Can't this be fixed by switching with to within? E.g.,

x = data.frame(a = 1:3, b = 4:6)

AddRowSums <- function(df) within(df, d <- a + b)

x <- AddRowSums(x)

print(x)

Michael
On Wed, Dec 7, 2011 at 12:07 AM, David Winsemius <dwinsemius at comcast.net> wrote:
#
Thanks to everyone for continued assistance with this problem.  I realize
that I had not included enough information, hopefully I have done so here. 
I attached a dput output of a sample of the data titled 'WaterData' (and str
output below).  Below are dput outputs of the function I am trying to get
working and the resulting array when I run it.  Unfortunately, Michael,
changing 'with' to 'within' did not solve the problem, as running the
function in that case produced no discernible output or result.  What I
meant by the function now producing an array of values (though the result I
am looking for) that are not attached to the data frame, is that they show
up separately in a result window (in a similar format to what you get from
dput() and are not at all associated with the data frame).  Again, thanks so
much!
function (dataframe) 
{
    dataframe$CalcFlag <- with(dataframe, ifelse(variable == 
        "CaD_ICP", (dataqualifier <- c("Y", "Q", "",
"A")[findInterval(dataframe$value, 
        c(-Inf, 0.027, 0.1, 100, Inf))]), ""))
}
'data.frame':	126 obs. of  5 variables:
 $ Site          : Factor w/ 6 levels "BV","CB","KP",..: 3 3 3 3 3 3 3 3 3 3
...
 $ Time          : Factor w/ 84 levels "0:00:00","0:00:52",..: 1 1 1 1 2 5
16 16 19 20 ...
 $ DateCorrectFmt: Factor w/ 9 levels "2010-08-17","2010-08-21",..: 4 8 1 3
8 5 5 8 8 8 ...
 $ variable      : Factor w/ 3 levels "CaD_ICP","NaD_ICP",..: 1 1 1 1 1 1 1
1 1 1 ...
 $ value         : num  0.044 0.1316 0.0101 0.0114 80.13 ...

Below is the output I get if if I run the WQFunc as:
flagged <- WQFunc(WaterData)
c("Q", "", "Y", "Y", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
)
Again, though, 'Flagged' is an array of those values in a output window but
are not 'attached' to WaterData.

--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170688.html
Sent from the R help mailing list archive at Nabble.com.
#
So I don't know what you are trying to do with your function (and I
might have incidentally messed it up), but within certainly does add
the desired column:

WaterData <- structure(list(Site = structure(c(3L, 3L, 3L, 3L, 3L,
3L), .Label = c("BV",
"CB", "KP", "LA", "MR", "PIE"), class = "factor"), Time = structure(c(1L,
1L, 1L, 1L, 2L, 5L), .Label = c("0:00:00", "0:00:52", "0:01:00",
"0:06:00", "0:07:00", "0:11:00", "0:16:00", "0:21:00", "0:26:00",
"0:36:00", "0:41:00", "0:51:00", "0:56:00", "1:01:00", "1:06:00",
"1:07:00", "1:10:00", "1:11:00", "1:22:00", "1:37:00", "1:52:00",
"10:22:00", "10:37:00", "10:52:00", "11:22:00", "15:55:00", "16:00:00",
"17:25:00", "17:30:00", "18:00:00", "18:05:00", "18:20:00", "18:35:00",
"18:50:00", "19:05:00", "19:20:00", "19:25:00", "19:35:00", "2:07:00",
"2:14:00", "2:16:00", "2:21:00", "2:22:00", "2:26:00", "2:31:00",
"2:36:00", "2:37:00", "2:38:00", "2:44:00", "2:46:00", "2:52:00",
"20:20:00", "20:32:00", "20:35:00", "21:05:00", "21:17:00", "21:20:00",
"21:32:00", "22:05:00", "22:20:00", "22:21:00", "22:32:00", "22:37:00",
"22:42:00", "22:47:00", "22:50:00", "23:16:00", "23:26:00", "23:31:00",
"23:36:00", "23:41:00", "23:46:00", "23:51:00", "23:56:00", "3:07:00",
"3:22:00", "3:52:00", "4:22:00", "4:52:00", "5:22:00", "6:37:00",
"6:57:00", "7:52:00", "9:22:00"), class = "factor"), DateCorrectFmt =
structure(c(4L,
8L, 1L, 3L, 8L, 5L), .Label = c("2010-08-17", "2010-08-21", "2010-08-22",
"2010-08-28", "2010-08-29", "2010-09-21", "2010-09-22", "2010-10-21",
"2011-02-19"), class = "factor"), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("CaD_ICP", "NaD_ICP", "ZnD_ICP"
), class = "factor"), value = c(0.044, 0.1316, 0.0101, 0.0114,
80.13, 15.42)), .Names = c("Site", "Time", "DateCorrectFmt",
"variable", "value"), row.names = c(1L, 2L, 6L, 8L, 11L, 14L), class =
"data.frame")

WQFunc <-  function(df){
      within(df, CalcFlag <- ifelse(variable =="CaD_ICP",
(dataqualifier <- c("Y", "Q", "","A")[findInterval(value,c(-Inf,
0.027, 0.1, 100, Inf))]), ""))
}

WQ <- WQFunc(WaterData)

print(WaterData)
print(WQ)
On Wed, Dec 7, 2011 at 5:01 PM, Steve E. <searl at vt.edu> wrote:
#
Michael (and others) - Right, 'within' did work, I had placed it in the wrong
location previously, which your example code made clear.  I wrapped several
of these functions within a function to address all the desired flags in a
single pass (probably horribly inefficient but it works).  Thanks again for
your most generous assistance.  Steve

--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4173391.html
Sent from the R help mailing list archive at Nabble.com.