Calculated mean value based on another column bin from dataframe.

Dear list,

I have a dataframe with two column as fellow.
head(dat)
V1      V2
 0.15624 0.94567
 0.26039 0.66442
 0.16629 0.97822
 0.23474 0.72079
 0.11037 0.83760
 0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
    fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
    m<-mean(dat[fil,2])
    mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.
dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = "data.frame")
Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
Dear list,

I have a dataframe with two column as fellow.

head(dat)
? ? ? V1 ? ? ?V2
?0.15624 0.94567
?0.26039 0.66442
?0.16629 0.97822
?0.23474 0.72079
?0.11037 0.83760
?0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
? ?fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
? ?m<-mean(dat[fil,2])
? ?mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = "data.frame")

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O
Dear Henrique Dallazuanna,

Thank you very much for your suggestion.

It is obvious that your method is better than me.

Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?

Thanks.
Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))

On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com> wrote:
Dear list,

I have a dataframe with two column as fellow.

head(dat)
? ? ? V1 ? ? ?V2
?0.15624 0.94567
?0.26039 0.66442
?0.16629 0.97822
?0.23474 0.72079
?0.11037 0.83760
?0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
? ?fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
? ?m<-mean(dat[fil,2])
? ?mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = "data.frame")

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Henrique Dallazuanna
Curitiba-Paran?-Brasil
25? 25' 40" S 49? 16' 22" O

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Hi

r-help-bounces at r-project.org napsal dne 06.04.2011 10:48:04:
Dear list,

I have a dataframe with two column as fellow.

head(dat)
       V1      V2
 0.15624 0.94567
 0.26039 0.66442
 0.16629 0.97822
 0.23474 0.72079
 0.11037 0.83760
 0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?
Do you want something like that?

#make data
x<-runif(100) 
y<-runif(100)

#cut first column to bins (in your case dat[,1] and ran)
x.c<-cut(x, seq(0,1,.1))

#aggregate column 2 according to bins (in your case dat[,2])
aggregate(y,list(x.c), mean)
     Group.1         x
1    (0,0.1] 0.5868734
2  (0.1,0.2] 0.5436263
3  (0.2,0.3] 0.5099366
4  (0.3,0.4] 0.4815855
5  (0.4,0.5] 0.4137687
6  (0.5,0.6] 0.4698156
7  (0.6,0.7] 0.4687639
8  (0.7,0.8] 0.5661048
9  (0.8,0.9] 0.5489297
10   (0.9,1] 0.4812521

Regards
Petr
dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
    fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
    m<-mean(dat[fil,2])
    mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class = 
"data.frame")

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Dear Henrique Dallazuanna,

Thank you very much for your suggestion.

It is obvious that your method is better than me.

Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?

Thanks.

On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna  
<wwwhsd at gmail.com> wrote:
Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))

On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com 
wrote:
Dear list,

I have a dataframe with two column as fellow.

head(dat)
      V1      V2
 0.15624 0.94567
 0.26039 0.66442
 0.16629 0.97822
 0.23474 0.72079
 0.11037 0.83760
 0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not  
the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
   fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
   m<-mean(dat[fil,2])
   mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =  
"data.frame")

______________________________________________
Here is how I would have done it with findInterval and tapply which is  
very similar to using a `cut` and `table` approach:

 > dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
 > tapply(dat$V2, dat$grp, mean)
         1         2         3         4         5         6         8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique's  
method. His yields:
 > mm
  [1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450  
0.7269900       NaN
  [8] 0.6978900       NaN       NaN       NaN

####----------------

The cut approach would yield this, which is more informatively  
labeled. (I'm wasn't completely sure the second to last word in the  
prior sentence was a real word, but several dictionaries seem to think  
so.):

 > dat$grp2 <- cut(dat$V1 , breaks=ran)
 > tapply(dat$V2, dat$grp2, mean)
   (0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
  0.9252300  0.8836100  0.9135429  0.9213600  0.8493450  0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
         NA  0.6978900         NA         NA

David Winsemius, MD
West Hartford, CT
This is extractly what I want. Thank you very much.
Hi

r-help-bounces at r-project.org napsal dne 06.04.2011 10:48:04:

Dear list,

I have a dataframe with two column as fellow.

head(dat)
? ? ? ?V1 ? ? ?V2
?0.15624 0.94567
?0.26039 0.66442
?0.16629 0.97822
?0.23474 0.72079
?0.11037 0.83760
?0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?
Do you want something like that?

#make data
x<-runif(100)
y<-runif(100)

#cut first column to bins (in your case dat[,1] and ran)
x.c<-cut(x, seq(0,1,.1))

#aggregate column 2 according to bins (in your case dat[,2])
aggregate(y,list(x.c), mean)
? ? Group.1 ? ? ? ? x
1 ? ?(0,0.1] 0.5868734
2 ?(0.1,0.2] 0.5436263
3 ?(0.2,0.3] 0.5099366
4 ?(0.3,0.4] 0.4815855
5 ?(0.4,0.5] 0.4137687
6 ?(0.5,0.6] 0.4698156
7 ?(0.6,0.7] 0.4687639
8 ?(0.7,0.8] 0.5661048
9 ?(0.8,0.9] 0.5489297
10 ? (0.9,1] 0.4812521

Regards
Petr

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
? ? fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
? ? m<-mean(dat[fil,2])
? ? mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
"data.frame")

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thanks. Before I never used findInterval function. It seems very nice.
On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

Dear Henrique Dallazuanna,

Thank you very much for your suggestion.

It is obvious that your method is better than me.

Is it possible to use cut, table,by etc? Whether there is some
aggregate function in R can do this?

Thanks.

On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna <wwwhsd at gmail.com>
wrote:
Try this:

fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
1)], '>=', e1 = dat[,1])
mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))

On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup at gmail.com>
wrote:
Dear list,

I have a dataframe with two column as fellow.

head(dat)
? ? ?V1 ? ? ?V2
?0.15624 0.94567
?0.26039 0.66442
?0.16629 0.97822
?0.23474 0.72079
?0.11037 0.83760
?0.14969 0.91312

I want to get the column V2 mean value based on the bin of column of
V1. I write the code as fellow. It works, but I think this is not the
elegant way. Any suggestions?

dat<-read.table("dat.txt",head=F)
ran<-seq(0,0.5,0.05)
mm<-NULL
for (i in c(1:(length(ran)-1)))
{
? fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
? m<-mean(dat[fil,2])
? mm<-c(mm,m)
}
mm

Here is the first 20 lines of my data.

dput(head(dat,20))
structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
)), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
"data.frame")

______________________________________________
Here is how I would have done it with findInterval and tapply which is very
similar to using a `cut` and `table` approach:

dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) )
tapply(dat$V2, dat$grp, mean)
? ? ? ?1 ? ? ? ? 2 ? ? ? ? 3 ? ? ? ? 4 ? ? ? ? 5 ? ? ? ? 6 ? ? ? ? 8
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900
#####---------------

You do not get exactly the same form of the result as with Henrique's
method. His yields:
mm
?[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 ? ? ? NaN
?[8] 0.6978900 ? ? ? NaN ? ? ? NaN ? ? ? NaN

####----------------

The cut approach would yield this, which is more informatively labeled. (I'm
wasn't completely sure the second to last word in the prior sentence was a
real word, but several dictionaries seem to think so.):

dat$grp2 <- cut(dat$V1 , breaks=ran)
tapply(dat$V2, dat$grp2, mean)
?(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
?0.9252300 ?0.8836100 ?0.9135429 ?0.9213600 ?0.8493450 ?0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]
? ? ? ?NA ?0.6978900 ? ? ? ? NA ? ? ? ? NA

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.