Dear R Community,
I hope you might be able to assist with a small problem creating a function.
I am working with water-quality data sets that contain the concentration of
many different elements in water samples. I need to assign quality-control
flags to values that fall into various concentration ranges. Rather than a
web of nested if statements, I am employing the findInterval function to
identify values that need to be flagged and to assign the appropriate flag.
The data consist of a sample identifier, the analysis, and corresponding
value. The findInterval function works well; however, I would like to
incorporate it into a function so that I can run multiple findInterval
functions for many different water-quality analyses (and I have to do this
for many dataset) but it seems to fall apart when incorporated into a
function.
Run straighforward, the findInterval function works as desired, e.g. below,
creating the new CalciumFlag column with the appropriate flag for, in this
case, levels of calcium in the water:
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))
However, it does not worked when wrapped in a function (no error messages
are thrown, it simply does not seem to do anything):
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))
}
Calling the function WQfunc() does not produce an error but also does not
produce the expected CalciumFlag, it seems to not do anything.
Ultimately, what I need to get to is something like below where multiple
findInterval functions for different analyses are included in a single
function, then I can concatenate the results into a single column containing
all flags for all analyses, e.g.:
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1, 100,
Inf))]),""))
WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium", (flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050, 0.125, 125,
Inf))]),""))
WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis == "Magnesium", (flags
<- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065, 0.15, 75,
Inf))]),""))
.....etc for additional water-quality analyses...
}
As an aside, I started working with the findInterval tool from an example
that I found online but am not clear as to how the multi-component
configuration incorporating brackets actually works, can anyone suggest a
good resource that explains this?
I thank you very much for any assistance you may be able to provide.
Regards,
Steve
--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4165464.html
Sent from the R help mailing list archive at Nabble.com.
help wrapping findInterval into a function
10 messages · Steve E., David Winsemius, William Dunlap +1 more
On Dec 6, 2011, at 11:43 AM, Steve E. wrote:
Dear R Community, I hope you might be able to assist with a small problem creating a function. I am working with water-quality data sets that contain the concentration of many different elements in water samples. I need to assign quality- control flags to values that fall into various concentration ranges. Rather than a web of nested if statements, I am employing the findInterval function to identify values that need to be flagged and to assign the appropriate flag.
It's probably your use of "with" rather than findInterval. The 'with' function sets up an environment and is used mostly in interactive session. If you have not passed a dataframe into the function then you should use the name of the dataframe and '['.
The data consist of a sample identifier, the analysis, and
corresponding
value. The findInterval function works well; however, I would like to
incorporate it into a function so that I can run multiple findInterval
functions for many different water-quality analyses (and I have to
do this
for many dataset) but it seems to fall apart when incorporated into a
function.
Run straighforward, the findInterval function works as desired, e.g.
below,
creating the new CalciumFlag column with the appropriate flag for,
in this
case, levels of calcium in the water:
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
However, it does not worked when wrapped in a function (no error
messages
are thrown, it simply does not seem to do anything):
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
}
Calling the function WQfunc() does not produce an error but also
does not
produce the expected CalciumFlag, it seems to not do anything.
Ultimately, what I need to get to is something like below where
multiple
findInterval functions for different analyses are included in a single
function, then I can concatenate the results into a single column
containing
all flags for all analyses, e.g.:
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050,
0.125, 125,
Inf))]),""))
WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis ==
"Magnesium", (flags
<- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065,
0.15, 75,
Inf))]),""))
.....etc for additional water-quality analyses...
}
As an aside, I started working with the findInterval tool from an
example
that I found online but am not clear as to how the multi-component
configuration incorporating brackets actually works, can anyone
suggest a
good resource that explains this?
I thank you very much for any assistance you may be able to provide.
Regards,
Steve
--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4165464.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
No, the problem is not with "with" but is
that the OP's did not return the modified
data.frame. He didn't show how the function
was called, but I suspect the usage was like
f0 <- function() globalDataFrame$newCol <- ...
f0()
where it should have been
f1 <- function(dataFrame) {
dataFrame$newCol <- ...
dataFrame # return modified dataFrame
}
globalDataFrame <- f1(globalDataFrame)
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Tuesday, December 06, 2011 12:05 PM To: Steve E. Cc: r-help at r-project.org Subject: Re: [R] help wrapping findInterval into a function On Dec 6, 2011, at 11:43 AM, Steve E. wrote:
Dear R Community, I hope you might be able to assist with a small problem creating a function. I am working with water-quality data sets that contain the concentration of many different elements in water samples. I need to assign quality- control flags to values that fall into various concentration ranges. Rather than a web of nested if statements, I am employing the findInterval function to identify values that need to be flagged and to assign the appropriate flag.
It's probably your use of "with" rather than findInterval. The 'with' function sets up an environment and is used mostly in interactive session. If you have not passed a dataframe into the function then you should use the name of the dataframe and '['.
The data consist of a sample identifier, the analysis, and
corresponding
value. The findInterval function works well; however, I would like to
incorporate it into a function so that I can run multiple findInterval
functions for many different water-quality analyses (and I have to
do this
for many dataset) but it seems to fall apart when incorporated into a
function.
Run straighforward, the findInterval function works as desired, e.g.
below,
creating the new CalciumFlag column with the appropriate flag for,
in this
case, levels of calcium in the water:
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
However, it does not worked when wrapped in a function (no error
messages
are thrown, it simply does not seem to do anything):
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
}
Calling the function WQfunc() does not produce an error but also
does not
produce the expected CalciumFlag, it seems to not do anything.
Ultimately, what I need to get to is something like below where
multiple
findInterval functions for different analyses are included in a single
function, then I can concatenate the results into a single column
containing
all flags for all analyses, e.g.:
WQfunc <- function() {
WQdata$CalciumFlag <- with(WQdata, ifelse(analysis == "Calcium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.027, 0.1,
100,
Inf))]),""))
WQdata$SodiumFlag <- with(WQdata, ifelse(analysis == "Sodium",
(flags <-
c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.050,
0.125, 125,
Inf))]),""))
WQdata$MagnesiumFlag <- with(WQdata, ifelse(analysis ==
"Magnesium", (flags
<- c("Y", 'Q', "", "A") [findInterval(WQdata$value, c(-Inf, 0.065,
0.15, 75,
Inf))]),""))
.....etc for additional water-quality analyses...
}
As an aside, I started working with the findInterval tool from an
example
that I found online but am not clear as to how the multi-component
configuration incorporating brackets actually works, can anyone
suggest a
good resource that explains this?
I thank you very much for any assistance you may be able to provide.
Regards,
Steve
--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-
function-tp4165464p4165464.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bill (and David),
Thank you very much for taking the time to respond to my query.
You were right, I was creating and calling the function exactly as you had
predicted. I revised the structure based on your suggestion. It runs but
the output is an array of the flags that are not attached to the data frame,
not a new column in the data frame as was my intention.
So, the new configuration I tried was like this (where DataFrame is not a
real data frame but just the word "DataFrame"):
WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <- with(DataFrame,
ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A")
[findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),""))
}
I called it using:
WaterQualityData <- WQFlags(WaterQualityData)
Again, the output is simply an array of the flags, unattached to a data
frame. Can you suggest a way to modify this to make it work as desired, or,
in the worst case, can I attach the resulting array of flag values?
Thank you again!
--
View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4166826.html
Sent from the R help mailing list archive at Nabble.com.
On Dec 6, 2011, at 5:53 PM, Steve E. wrote:
Bill (and David),
Thank you very much for taking the time to respond to my query.
You were right, I was creating and calling the function exactly as
you had
predicted. I revised the structure based on your suggestion. It
runs but
the output is an array of the flags that are not attached to the
data frame,
not a new column in the data frame as was my intention.
So, the new configuration I tried was like this (where DataFrame is
not a
real data frame but just the word "DataFrame"):
WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <-
with(DataFrame,
ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A")
[findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),""))
}
I called it using:
WaterQualityData <- WQFlags(WaterQualityData)
Unless you provide either the original data or an unambiguous ( at the level the R interpreter would see, not at the level of what you see when you print a dataframe) description of your data you will get at the very best educated guesses. Use str() or dput().
Again, the output is simply an array of the flags, unattached to a data frame. Can you suggest a way to modify this to make it work as desired, or, in the worst case, can I attach the resulting array of flag values?
Do you mean "attach" in the sense of using the R function `attach`? If so, then please do not. (And please ignore any advice or the examples concerning that issue you might get from reading Crawley's text.) -- Good night. David Winsemius, MD West Hartford, CT
Can't this be fixed by switching with to within? E.g., x = data.frame(a = 1:3, b = 4:6) AddRowSums <- function(df) within(df, d <- a + b) x <- AddRowSums(x) print(x) Michael
On Wed, Dec 7, 2011 at 12:07 AM, David Winsemius <dwinsemius at comcast.net> wrote:
On Dec 6, 2011, at 5:53 PM, Steve E. wrote:
Bill (and David),
Thank you very much for taking the time to respond to my query.
You were right, I was creating and calling the function exactly as you had
predicted. ?I revised the structure based on your suggestion. ?It runs but
the output is an array of the flags that are not attached to the data
frame,
not a new column in the data frame as was my intention.
So, the new configuration I tried was like this (where DataFrame is not a
real data frame but just the word "DataFrame"):
WQFlags <- function(DataFrame) {DataFrame$CalciumFlag <- with(DataFrame,
ifelse(variable == "CaD_ICP", (dataqualifier <- c("Y", 'Q', "", "A")
[findInterval(DataFrame$value, c(-Inf, 0.027, 0.1, 100, Inf))]),""))
}
I called it using:
WaterQualityData <- WQFlags(WaterQualityData)
Unless you provide either the original data or an unambiguous ( at the level the R interpreter would see, not at the level of what you see when you print a dataframe) description of your data you will get at the very best educated guesses. Use str() or dput().
Again, the output is simply an array of the flags, unattached to a data frame. ?Can you suggest a way to modify this to make it work as desired, or, in the worst case, can I attach the resulting array of flag values?
Do you mean "attach" in the sense of using the R function `attach`? If so, then please do not. (And please ignore any advice or the examples concerning that issue you might get from reading Crawley's text.) -- Good night. David Winsemius, MD West Hartford, CT
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks to everyone for continued assistance with this problem. I realize that I had not included enough information, hopefully I have done so here. I attached a dput output of a sample of the data titled 'WaterData' (and str output below). Below are dput outputs of the function I am trying to get working and the resulting array when I run it. Unfortunately, Michael, changing 'with' to 'within' did not solve the problem, as running the function in that case produced no discernible output or result. What I meant by the function now producing an array of values (though the result I am looking for) that are not attached to the data frame, is that they show up separately in a result window (in a similar format to what you get from dput() and are not at all associated with the data frame). Again, thanks so much!
dput(WQFunc)
function (dataframe)
{
dataframe$CalcFlag <- with(dataframe, ifelse(variable ==
"CaD_ICP", (dataqualifier <- c("Y", "Q", "",
"A")[findInterval(dataframe$value,
c(-Inf, 0.027, 0.1, 100, Inf))]), ""))
}
str(WaterData)
'data.frame': 126 obs. of 5 variables: $ Site : Factor w/ 6 levels "BV","CB","KP",..: 3 3 3 3 3 3 3 3 3 3 ... $ Time : Factor w/ 84 levels "0:00:00","0:00:52",..: 1 1 1 1 2 5 16 16 19 20 ... $ DateCorrectFmt: Factor w/ 9 levels "2010-08-17","2010-08-21",..: 4 8 1 3 8 5 5 8 8 8 ... $ variable : Factor w/ 3 levels "CaD_ICP","NaD_ICP",..: 1 1 1 1 1 1 1 1 1 1 ... $ value : num 0.044 0.1316 0.0101 0.0114 80.13 ... Below is the output I get if if I run the WQFunc as: flagged <- WQFunc(WaterData)
dput(Flagged)
c("Q", "", "Y", "Y", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
)
Again, though, 'Flagged' is an array of those values in a output window but are not 'attached' to WaterData. -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170688.html Sent from the R help mailing list archive at Nabble.com.
forgot to attach the data set http://r.789695.n4.nabble.com/file/n4170695/WaterData.txt WaterData.txt -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170695.html Sent from the R help mailing list archive at Nabble.com.
So I don't know what you are trying to do with your function (and I
might have incidentally messed it up), but within certainly does add
the desired column:
WaterData <- structure(list(Site = structure(c(3L, 3L, 3L, 3L, 3L,
3L), .Label = c("BV",
"CB", "KP", "LA", "MR", "PIE"), class = "factor"), Time = structure(c(1L,
1L, 1L, 1L, 2L, 5L), .Label = c("0:00:00", "0:00:52", "0:01:00",
"0:06:00", "0:07:00", "0:11:00", "0:16:00", "0:21:00", "0:26:00",
"0:36:00", "0:41:00", "0:51:00", "0:56:00", "1:01:00", "1:06:00",
"1:07:00", "1:10:00", "1:11:00", "1:22:00", "1:37:00", "1:52:00",
"10:22:00", "10:37:00", "10:52:00", "11:22:00", "15:55:00", "16:00:00",
"17:25:00", "17:30:00", "18:00:00", "18:05:00", "18:20:00", "18:35:00",
"18:50:00", "19:05:00", "19:20:00", "19:25:00", "19:35:00", "2:07:00",
"2:14:00", "2:16:00", "2:21:00", "2:22:00", "2:26:00", "2:31:00",
"2:36:00", "2:37:00", "2:38:00", "2:44:00", "2:46:00", "2:52:00",
"20:20:00", "20:32:00", "20:35:00", "21:05:00", "21:17:00", "21:20:00",
"21:32:00", "22:05:00", "22:20:00", "22:21:00", "22:32:00", "22:37:00",
"22:42:00", "22:47:00", "22:50:00", "23:16:00", "23:26:00", "23:31:00",
"23:36:00", "23:41:00", "23:46:00", "23:51:00", "23:56:00", "3:07:00",
"3:22:00", "3:52:00", "4:22:00", "4:52:00", "5:22:00", "6:37:00",
"6:57:00", "7:52:00", "9:22:00"), class = "factor"), DateCorrectFmt =
structure(c(4L,
8L, 1L, 3L, 8L, 5L), .Label = c("2010-08-17", "2010-08-21", "2010-08-22",
"2010-08-28", "2010-08-29", "2010-09-21", "2010-09-22", "2010-10-21",
"2011-02-19"), class = "factor"), variable = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = c("CaD_ICP", "NaD_ICP", "ZnD_ICP"
), class = "factor"), value = c(0.044, 0.1316, 0.0101, 0.0114,
80.13, 15.42)), .Names = c("Site", "Time", "DateCorrectFmt",
"variable", "value"), row.names = c(1L, 2L, 6L, 8L, 11L, 14L), class =
"data.frame")
WQFunc <- function(df){
within(df, CalcFlag <- ifelse(variable =="CaD_ICP",
(dataqualifier <- c("Y", "Q", "","A")[findInterval(value,c(-Inf,
0.027, 0.1, 100, Inf))]), ""))
}
WQ <- WQFunc(WaterData)
print(WaterData)
print(WQ)
On Wed, Dec 7, 2011 at 5:01 PM, Steve E. <searl at vt.edu> wrote:
Thanks to everyone for continued assistance with this problem. ?I realize that I had not included enough information, hopefully I have done so here. I attached a dput output of a sample of the data titled 'WaterData' (and str output below). ?Below are dput outputs of the function I am trying to get working and the resulting array when I run it. ?Unfortunately, Michael, changing 'with' to 'within' did not solve the problem, as running the function in that case produced no discernible output or result. ?What I meant by the function now producing an array of values (though the result I am looking for) that are not attached to the data frame, is that they show up separately in a result window (in a similar format to what you get from dput() and are not at all associated with the data frame). ?Again, thanks so much!
dput(WQFunc)
function (dataframe)
{
? ?dataframe$CalcFlag <- with(dataframe, ifelse(variable ==
? ? ? ?"CaD_ICP", (dataqualifier <- c("Y", "Q", "",
"A")[findInterval(dataframe$value,
? ? ? ?c(-Inf, 0.027, 0.1, 100, Inf))]), ""))
}
str(WaterData)
'data.frame': ? 126 obs. of ?5 variables: ?$ Site ? ? ? ? ?: Factor w/ 6 levels "BV","CB","KP",..: 3 3 3 3 3 3 3 3 3 3 ... ?$ Time ? ? ? ? ?: Factor w/ 84 levels "0:00:00","0:00:52",..: 1 1 1 1 2 5 16 16 19 20 ... ?$ DateCorrectFmt: Factor w/ 9 levels "2010-08-17","2010-08-21",..: 4 8 1 3 8 5 5 8 8 8 ... ?$ variable ? ? ?: Factor w/ 3 levels "CaD_ICP","NaD_ICP",..: 1 1 1 1 1 1 1 1 1 1 ... ?$ value ? ? ? ? : num ?0.044 0.1316 0.0101 0.0114 80.13 ... Below is the output I get if if I run the WQFunc as: flagged <- WQFunc(WaterData)
dput(Flagged)
c("Q", "", "Y", "Y", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "",
"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""
)
Again, though, 'Flagged' is an array of those values in a output window but are not 'attached' to WaterData. -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4170688.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Michael (and others) - Right, 'within' did work, I had placed it in the wrong location previously, which your example code made clear. I wrapped several of these functions within a function to address all the desired flags in a single pass (probably horribly inefficient but it works). Thanks again for your most generous assistance. Steve -- View this message in context: http://r.789695.n4.nabble.com/help-wrapping-findInterval-into-a-function-tp4165464p4173391.html Sent from the R help mailing list archive at Nabble.com.