Odd behavior of a function within apply
Avi, that?s great! Thanks
On Tue, Aug 9, 2022 at 12:56 PM <avi.e.gross at gmail.com> wrote:
Yes, David, the function described seems to insist it be of type integer or type character and if the type was double or others might well fail as y would never be initialized. The goal seems to be to count how many "missing" values are found as in NA if a numeric type or an empty string if character. But you can have some form of NA in all kinds of object types including character as in this construct:
x <- c("a", NA, "", "b", "NA)")
x
[1] "a" NA "" "b" "NA)" The above has three useless elements if both NA and "" are considered empty. So logically the condition could be to count NA and IF it is of type character, also count "". So rather than play games testing not just is.integer, is.double (or just is.numeric) as well as is.logical and is.raw, all the above can be tested with is.na() first to add up how many Na they contain. If then it is of type character, you can add any blank strings. So the algorithm would initialize y to sum(is.na(vec)) and then if the vec is character, add the sum of how many empty strings. Alternately, the function should deal with what it wants to do if any other type is encountered. You can internally converts many things to integer or character and then operate on them. Or you can return a zero or raise an alarm when given something else. In this case, simply setting y to zero before using it would make it defined and avoid the error, albeit report nothing found if it was a double or Boolean vector even if it did contain NA. -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of David Carlson via R-help Sent: Tuesday, August 9, 2022 11:33 AM To: Erin Hodgess <erinm.hodgess at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Odd behavior of a function within apply Could you have columns that are not character or integer so that y is never defined in the function? count1a(1:5/3) Error in count1a(1:5/3) : object 'y' not found David Carlson On Mon, Aug 8, 2022 at 1:35 PM Erin Hodgess <erinm.hodgess at gmail.com> wrote:
OK.?? I'm back again.?? So my test1.??df is 236x390 If I put in the following:?? lapply(test1.??df,count1a) Error in FUN(X[[i]], .??.??.??) :?? object 'y' not found > lapply(test1.??df,count1a) Error in FUN(X[[i]], .??.??.??) :?? object 'y' not found > sapply(test1.??df,count1a) ZjQcmQRYFpfptBannerStart This Message Is From an External Sender This message came from outside your organization. ZjQcmQRYFpfptBannerEnd OK. I'm back again. So my test1.df is 236x390 If I put in the following: lapply(test1.df,count1a) Error in FUN(X[[i]], ...) : object 'y' not found
lapply(test1.df,count1a)
Error in FUN(X[[i]], ...) : object 'y' not found
sapply(test1.df,count1a)
Error in FUN(X[[i]], ...) : object 'y' not found
What am I doing wrong, please? Thanks, Erin Erin Hodgess, PhD mailto: erinm.hodgess at gmail.com On Mon, Aug 8, 2022 at 1:41 PM Erin Hodgess <erinm.hodgess at gmail.com>
wrote:
Awesome, thanks so much!! Erin Hodgess, PhD mailto: erinm.hodgess at gmail.com On Mon, Aug 8, 2022 at 1:38 PM John Fox <jfox at mcmaster.ca> wrote:
Dear Erin,
The problem is that the data frame gets coerced to a character
matrix, and the only column with "" entries is the 9th (the second
one you
supplied):
as.matrix(test1.df)
X1_1_HZP1 X1_1_HBM1_mon X1_1_HBM1_yr
1 "48160" "December" "2014"
2 "48198" "June" "2018"
3 "80027" "August" "2016"
4 "48161" "" NA
5 NA "" NA
6 "48911" "August" "1985"
7 NA "April" "2019"
8 "48197" "February" "1993"
9 "48021" "" NA
10 "11355" "December" "1990"
(Here, test1.df only contains the three columns you provided.)
A solution is to use sapply:
> sapply(test1.df, count1a)
X1_1_HZP1 X1_1_HBM1_mon X1_1_HBM1_yr
2 3 3
I hope this helps,
John
On 2022-08-08 1:22 p.m., Erin Hodgess wrote:
Hello!
I have the following data.frame
dput(test1.df[1:10,8:10])
structure(list(X1_1_HZP1 = c(48160L, 48198L, 80027L, 48161L, NA,
48911L, NA, 48197L, 48021L, 11355L), X1_1_HBM1_mon =
c("December", "June", "August", "", "", "August", "April",
"February", "", "December"), X1_1_HBM1_yr = c(2014L, 2018L,
2016L, NA, NA, 1985L, 2019L, 1993L, NA, 1990L)), row.names =
c(NA, 10L), class = "data.frame")
And the following function:
dput(count1a)
function (x)
{
if (typeof(x) == "integer")
y <- sum(is.na(x))
if (typeof(x) == "character")
y <- sum(x == "")
return(y)
}
When I use the apply function with count1a, I get the following:
apply(test1.df[1:10,8:10],2,count1a)
X1_1_HZP1 X1_1_HBM1_mon X1_1_HBM1_yr
NA 3 NA
However, when I do use columns 8 and 10, I get the correct response:
apply(test1.df[1:10,c(8,10)],2,count1a)
X1_1_HZP1 X1_1_HBM1_yr
2 3
I am really baffled. If I use count1a on a single column, it works
fine.
Any suggestions much appreciated.
Thanks,
Sincerely,
Erin
Erin Hodgess, PhD
mailto: erinm.hodgess at gmail.com
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo /r-help__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kd RHAy8SJJdx6Uq0p4rpBa4E3DkmQ65UImH48MBvSbrfE$ PLEASE do read the posting guide
https://urldefense.com/v3/__http://www.R-project.org/posting-guide. html__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8 SJJdx6Uq0p4rpBa4E3DkmQ65UImH48MdYOqruE$
and provide commented, minimal, self-contained, reproducible code.
-- John Fox, Professor Emeritus McMaster University Hamilton, Ontario, Canada web: https://urldefense.com/v3/__https://socialsciences.mcmaster.ca/jfox /__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJJ dx6Uq0p4rpBa4E3DkmQ65UImH48MRU4wu3o$
[[alternative HTML version deleted]]
______________________________________________R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r -help__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJ Jdx6Uq0p4rpBa4E3DkmQ65UImH48MBvSbrfE$ PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.htm l__;!!KwNVnqRv!CHx9JKnbOObpAt0LltEogLSxDUEl9qJDI6FgqMJBG_kdRHAy8SJJdx6 Uq0p4rpBa4E3DkmQ65UImH48MdYOqruE$ and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Erin Hodgess, PhD mailto: erinm.hodgess at gmail.com [[alternative HTML version deleted]]