Message-ID: <1349889759.23388.YahooMailNeo@web142605.mail.bf1.yahoo.com>
Date: 2012-10-10T17:22:39Z
From: arun
Subject: Summary using by() returns character arrays in a list
In-Reply-To: <f3b45733c22f1700635b6b87629f66f4.squirrel@webmail.xs4all.nl>
HI,
May be this helps you:
Using the dataset iris:
by.list<-by(iris, iris$Species, summary)
dat1<-do.call(rbind,lapply(by.list,function(x) gsub(".*\\:","",x)))
row.names(dat1)<-paste(rep(unlist(dimnames(by.list),use.names=F),each=6),unlist(lapply(lapply(by.list,`[`,1:6),function(x) gsub("\\:.*","",x)),use.names=F),sep=":")
?dat2<-data.frame(dat1)
colnames(dat2)<-colnames(dat1)
dat2[]<-sapply(dat2,function(x) as.numeric(as.character(x)))
?head(dat2,8)
#??????????????????? Sepal.Length? Sepal.Width? Petal.Length? Petal.Width
#setosa:Min.??????????????? 4.300??????? 2.300???????? 1.000??????? 0.100
#setosa:1st Qu.???????????? 4.800??????? 3.200???????? 1.400??????? 0.200
#setosa:Median????????????? 5.000??????? 3.400???????? 1.500??????? 0.200
#setosa:Mean??????????????? 5.006??????? 3.428???????? 1.462??????? 0.246
#setosa:3rd Qu.???????????? 5.200??????? 3.675???????? 1.575??????? 0.300
#setosa:Max.??????????????? 5.800??????? 4.400???????? 1.900??????? 0.600
#versicolor:Min.??????????? 4.900??????? 2.000???????? 3.000??????? 1.000
#versicolor:1st Qu.???????? 5.600??????? 2.525???????? 4.000??????? 1.200
???????????????????????? Species
#setosa:Min.?????????????????? 50
#setosa:1st Qu.???????????????? 0
#setosa:Median????????????????? 0
#setosa:Mean?????????????????? NA
#setosa:3rd Qu.??????????????? NA
#setosa:Max.?????????????????? NA
#versicolor:Min.??????????????? 0
#versicolor:1st Qu.??????????? 50
?str(dat2)
#'data.frame':??? 18 obs. of? 5 variables:
# $? Sepal.Length: num? 4.3 4.8 5 5.01 5.2 ...
# $? Sepal.Width : num? 2.3 3.2 3.4 3.43 3.67 ...
# $? Petal.Length: num? 1 1.4 1.5 1.46 1.57 ...
# $? Petal.Width : num? 0.1 0.2 0.2 0.246 0.3 ...
?#$?????? Species: num? 50 0 0 NA NA NA 0 50 0 NA ...
Not sure, if you need the last column.
I agree that aggregate() or ddply() will be easier.
A.K.
----- Original Message -----
From: Alex van der Spek <doorz at xs4all.nl>
To: r-help at r-project.org
Cc:
Sent: Wednesday, October 10, 2012 8:47 AM
Subject: [R] Summary using by() returns character arrays in a list
I use by() to generate a summary statistics like so:
Lbys <- by(dat[Nidx], dat$LipTest, summary)
where Nidx is an index vector with names picking out the columns in the
data frame dat.
This returns a list of character arrays (see below for str() output) where
the columns are named correctly but the rownames are empty strings and the
values are strings prepended with the summary statistic's name (e.g.
"Min.", "Median ").
I am reading the code of summary.data.frame() but can't figure out how I
can change the action of that function to return list of numeric matrices
with as rownames the summary statistic's name ("Min.", "Max." etc) and as
values the numeric values of the calculated summary statistic.
Any help much appreciated!
Regards,
Alex van der Spek
> str(Lbys)
List of 2
$? ? : 'table' chr [1:6, 1:19] "Min.? :-0.190? " "1st Qu.: 9.297? "
"Median :10.373? " "Mean? :10.100? " ...
? ..- attr(*, "dimnames")=List of 2
? .. ..$ : chr [1:6] "" "" "" "" ...
? .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
"Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
$ T38: 'table' chr [1:6, 1:19] "Min.? :8.648? " "1st Qu.:8.920? "
"Median :9.018? " "Mean? :9.027? " ...
? ..- attr(*, "dimnames")=List of 2
? .. ..$ : chr [1:6] "" "" "" "" ...
? .. ..$ : chr [1:19] "Cell_3_SOS....GVF." "Cell_3_SOSq..ms.ms."
"Cell_3_Airflow..cfm." "Cell_3_Float..in.." ...
- attr(*, "dim")= int 2
- attr(*, "dimnames")=List of 1
? ..$ dat$LipTest: chr [1:2] "" "T38"
- attr(*, "call")= language by.data.frame(data = dat[Nidx], INDICES =
dat$LipTest, FUN = summary)
- attr(*, "class")= chr "by"
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.