An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120229/44cd1c08/attachment.pl>
Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data
7 messages · Aniruddha Mukherjee, Berend Hasselman, R. Michael Weylandt +2 more
On 29-02-2012, at 09:45, Aniruddha Mukherjee wrote:
Hello R people, How can I compute the mean of the "Pulse_rate" column of the data frame or matrix from the following character object called "str_got". It has 14 entries and each entry has 8 values, separated by commas. Please go thru the following R commands to know how I tried to unstring and unlist the values to form a data frame.
str_got
[1] "bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0"
"bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0"
.....
matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)
nrows? I assume this was set somewhere in your script and not shown. Is it length(str_got)?
matr
[,1] [,2] [,3]
[,4] [,5] [,6] [,7] [,8]
[1,] "bp" "67" "2011-12-09T19:59:44.044+05:30" "9830576102" "68.0"
......
Note column names must be inserted before computing the desired mean value. matr1<-as.data.frame(matr)
Use matr1 <- as.data.frame(matr, stringsAsFactors=FALSE) If you don't dos tringsAsFactors=FALSE the column will be a factor and that is not equivalent with numeric. What's wrong with matr1$Pulse_rate <- as.numeric(matr1$Pulse_rate) Then you can calculate the desired mean with mean(matr1$Pulse_rate) or mean(matr1[,"Pulse_rate"]) Berend
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20120229/289b8b1d/attachment.pl>
On 29-02-2012, at 11:49, Aniruddha Mukherjee wrote:
Hello Berend. Many thanks for your prompt reply and that helped me a lot. One more thing, if you please explain, I shall be highly obliged. Why in my case (i.e. when stringsAsFactors was TRUE by default),
as.numeric(matr1$Pulse_rate)
displays the following [1] 4 5 7 5 9 8 6 10 3 2 5 1 10 10 ?
?factor
and play a little bit
as.factor(c("A","8.9"))
as.numeric(as.factor(c("A","8.9")))
str(as.factor(c("A","8.9")))
and read the R Intro manual (chapter 4).
Berend
Factors are internally stored as integers (enums if you have used other programming languages) with a special label set -- it's more memory efficient than storing the whole string over and over. Michael On Wed, Feb 29, 2012 at 5:49 AM, Aniruddha Mukherjee
<aniruddha.mukherjee at tcs.com> wrote:
Hello Berend. Many thanks for your prompt reply and that helped me a lot. One more thing, if you please explain, I shall be highly obliged. Why in my case (i.e. when stringsAsFactors was TRUE by default),
as.numeric(matr1$Pulse_rate)
displays the following ?[1] ?4 ?5 ?7 ?5 ?9 ?8 ?6 10 ?3 ?2 ?5 ?1 10 10 ? Best regards. From: Berend Hasselman <bhh at xs4all.nl> To: Aniruddha Mukherjee <aniruddha.mukherjee at tcs.com> Cc: R-help <r-help at r-project.org> Date: 02/29/2012 03:57 PM Subject: Re: [R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data On 29-02-2012, at 09:45, Aniruddha Mukherjee wrote:
Hello R people, How can I compute the mean of the "Pulse_rate" column of the data frame
or
matrix from the following character object called "str_got". It has 14 entries and each entry has 8 values, separated by commas. Please go thru
the following R commands to know how I tried to unstring and unlist the values to form a data frame.
str_got
[1]
"bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0"
"bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0" .....
matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)
nrows? I assume this was set somewhere in your script and not shown. Is it length(str_got)?
matr
? ? ? ?[,1] ? [,2] ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?[,3] ? ? ? [,4] ? ? ? ? ? ? ? [,5] ? ? ? ?[,6] ? ? ? [,7] ? ? ? [,8] [1,] "bp" "67" ? ?"2011-12-09T19:59:44.044+05:30" "9830576102" "68.0" ......
Note column names must be inserted before computing the desired mean value. matr1<-as.data.frame(matr)
Use matr1 <- as.data.frame(matr, stringsAsFactors=FALSE) If you don't dos tringsAsFactors=FALSE the column will be a factor and that is not equivalent with numeric. What's wrong with matr1$Pulse_rate <- as.numeric(matr1$Pulse_rate) Then you can calculate the desired mean with mean(matr1$Pulse_rate) or mean(matr1[,"Pulse_rate"]) Berend =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ? ? ? ?[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
Factors are internally stored as integers (enums if you have used other programming languages) with a special label set -- it's more memory efficient than storing the whole string over and over.
That was one of the original justifications, but character vectors are
just as memory efficient these days.
The other justifications are still valid: sometimes you have a vector
which only takes on a subset of the possible values it could take, and
when you tabulate it, you'd like to see those zero counts. You may also
want to control the display order, and a factor allows that.
For example:
x <- c("a", "a", "b")
table(x)
x <- factor(x, levels=c("c", "b", "a"))
table(x)
Duncan Murdoch
Michael On Wed, Feb 29, 2012 at 5:49 AM, Aniruddha Mukherjee <aniruddha.mukherjee at tcs.com> wrote:
Hello Berend. Many thanks for your prompt reply and that helped me a lot. One more thing, if you please explain, I shall be highly obliged. Why in my case (i.e. when stringsAsFactors was TRUE by default),
as.numeric(matr1$Pulse_rate)
displays the following [1] 4 5 7 5 9 8 6 10 3 2 5 1 10 10 ? Best regards. From: Berend Hasselman<bhh at xs4all.nl> To: Aniruddha Mukherjee<aniruddha.mukherjee at tcs.com> Cc: R-help<r-help at r-project.org> Date: 02/29/2012 03:57 PM Subject: Re: [R] Error occurred during mean calculation of a column of a data frame, which is apparently contents numeric data On 29-02-2012, at 09:45, Aniruddha Mukherjee wrote:
Hello R people, How can I compute the mean of the "Pulse_rate" column of the data frame
or
matrix from the following character object called "str_got". It has 14 entries and each entry has 8 values, separated by commas. Please go thru
the following R commands to know how I tried to unstring and unlist the values to form a data frame.
str_got
[1]
"bp,67,2011-12-09T19:59:44.044+05:30,9830576102,68.0,124.0,58.0,66.0"
"bp,67,2011-12-09T20:19:31.031+05:30,9830576102,72.0,133.0,93.0,40.0" .....
matr<-matrix(unlist(strsplit(str_got, ",")), nrows, byrow=T)
nrows? I assume this was set somewhere in your script and not shown. Is it length(str_got)?
matr
[,1] [,2] [,3]
[,4] [,5] [,6] [,7] [,8]
[1,] "bp" "67" "2011-12-09T19:59:44.044+05:30" "9830576102" "68.0"
......
Note column names must be inserted before computing the desired mean value. matr1<-as.data.frame(matr)
Use matr1<- as.data.frame(matr, stringsAsFactors=FALSE)
If you don't dos tringsAsFactors=FALSE the column will be a factor and
that is not equivalent with numeric.
What's wrong with
matr1$Pulse_rate<- as.numeric(matr1$Pulse_rate)
Then you can calculate the desired mean with
mean(matr1$Pulse_rate)
or
mean(matr1[,"Pulse_rate"])
Berend
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain
confidential or privileged information. If you are
not the intended recipient, any dissemination, use,
review, distribution, printing or copying of the
information contained in this e-mail message
and/or attachments to it are strictly prohibited. If
you have received this communication in error,
please notify us by reply e-mail or telephone and
immediately and permanently delete the message
and any attachments. Thank you
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 29/02/2012 13:41, Duncan Murdoch wrote:
On 12-02-29 8:16 AM, R. Michael Weylandt wrote:
Factors are internally stored as integers (enums if you have used other programming languages) with a special label set -- it's more memory efficient than storing the whole string over and over.
That was one of the original justifications, but character vectors are just as memory efficient these days.
No, not really. Character vectors (STRSXPs) store a pointer for each string entry, and factors store an integer. On most current systems pointers are twice the size of integers, so on a 64-bit system > a <- rep(letters[1:10], each = 1000) > object.size(a) 80520 bytes > object.size(as.factor(a)) 41008 bytes
The other justifications are still valid: sometimes you have a vector
which only takes on a subset of the possible values it could take, and
when you tabulate it, you'd like to see those zero counts. You may also
want to control the display order, and a factor allows that.
For example:
x <- c("a", "a", "b")
table(x)
x <- factor(x, levels=c("c", "b", "a"))
table(x)
Duncan Murdoch
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595