Hello, I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. Thank you, Tom Example data Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA
Unequal column lengths
4 messages · Tom Mosca, David Winsemius, Jim Lemon +1 more
On Apr 14, 2016, at 2:33 PM, Tom Mosca <tom at vims.edu> wrote: Hello, I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column.
Most of the summary statistic functions have an na.rm options that you should set to TRUE.
Thank you, Tom Example data Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA
Looks like you have an R dataframe already, so I would try( colMeans(data, na.rm=TRUE)
[[alternative HTML version deleted]]
And do learn to configure your email client to post to r-help in plain text.
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
David Winsemius Alameda, CA, USA
Hi Tom, What you want is a list rather than a data frame. So: df<-read.table(text=" Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA", header=TRUE) dflist<-as.list(df) na.remove<-function(x) return(x[!is.na(x)]) sapply(dflist,na.remove) Jim
On Fri, Apr 15, 2016 at 7:33 AM, Tom Mosca <tom at vims.edu> wrote:
Hello,
I?ve tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do.
How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column.
Thank you, Tom
Example data
Dat1 Dat2 Dat3
1 1 5 4
2 7 7 9
3 3 3 5
4 2 NA 5
5 9 NA NA
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Many basic summary stats in R will not work (i.e. usually return an NA) if there are NAs in the data unless you explicitylauthorize it to do so.
With your data set df
with(df, mean(Dat2, na.rm = TRUE))
[1] 5
This by the way is functionally the same as
mean(df$Dat2, na.rm = TRUE)
It's just easier to type the first one
In other cases R will do not object to the NA's
summary(df)
Dat1 Dat2 Dat3
Min. :1.0 Min. :3 Min. :4.00
1st Qu.:2.0 1st Qu.:4 1st Qu.:4.75
Median :3.0 Median :5 Median :5.00
Mean :4.4 Mean :5 Mean :5.75
3rd Qu.:7.0 3rd Qu.:6 3rd Qu.:6.00
Max. :9.0 Max. :7 Max. :9.00
NA's :2 NA's :1
John Kane
Kingston ON Canada
-----Original Message----- From: tom at vims.edu Sent: Thu, 14 Apr 2016 21:33:31 +0000 To: r-help at r-project.org Subject: [R] Unequal column lengths Hello, Ive tried several times to learn R, but have never gotten past a particular gate. My data are organized by column in Excel, with column headers in the first row. The columns are of unequal lengths. I export them as CSV, then import the CSV file into R. I wish to summarize the data by column. R inserts NA for missing values, then refuses to operate on columns with NA. R is importing my data into a data frame, and I realize that is inappropriate for what I want to do. How can I import my data so that I can work on columns of unequal length? The first thing I would like to do is generate a table containing mean, median, mode, standard deviation, min, max and count, all per column. Thank you, Tom Example data Dat1 Dat2 Dat3 1 1 5 4 2 7 7 9 3 3 3 5 4 2 NA 5 5 9 NA NA [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
____________________________________________________________ FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!