Skip to content

help with calculation from dataframe with multiple entries per sample

12 messages · Phil Spector, David Winsemius, Rui Barradas +4 more

#
Julie -
    Since the apply functions operate on one row at a time, they
can't do what you want.  I think the easiest way to solve your 
problem is to reshape the data set, and merge it back with the 
original:
+                 Time=c(1,2,3,1,2,3,1,2,3),
+                 Mass=c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5))
Sample Time Mass Gain2-3
1      1    1  3.0     0.3
2      1    2  3.1     0.3
3      1    3  3.4     0.3
4      2    1  4.0     0.1
5      2    2  4.3     0.1
6      2    3  4.4     0.1
7      3    1  3.0     0.3
8      3    2  3.2     0.3
9      3    3  3.5     0.3

You may want to avoid using special characters like dashes in variable
names.

Hope this helps.

 					- Phil Spector
 					 Statistical Computing Facility
 					 Department of Statistics
 					 UC Berkeley
 					 spector at stat.berkeley.edu
On Mon, 17 Sep 2012, Julie Lee-Yaw wrote:

            
#
On Sep 17, 2012, at 4:15 PM, Julie Lee-Yaw wrote:

            
Please tell me where you learned that as.data.frame(cbind(.)) construction.
mydata$gain2.3 <- with( mydata, ave( Mass , Time, FUN=function(x) diff(x[2],x[3]) ) )
Sample Time Mass gain2.3
1      1    1  3.0     0.3
2      1    2  3.1     0.3
3      1    3  3.4     0.3
4      2    1  4.0     0.1
5      2    2  4.3     0.1
6      2    3  4.4     0.1
7      3    1  3.0     0.3
8      3    2  3.2     0.3
9      3    3  3.5     0.3

  
    
#
On Sep 17, 2012, at 5:00 PM, David Winsemius wrote:

            
OOOPpps  .... the code above was a failed attempt.
... the code below should "work".
David Winsemius, MD
Alameda, CA, USA
#
Or diff(x[2:3])

Rui Barradas
Em 18-09-2012 01:05, David Winsemius escreveu:
#
HI,
Try this:
?mydata$Gain<-rep(tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2])),each=length(unique(mydata$Sample)))
?mydata
#? Sample Time Mass Gain
#1????? 1??? 1? 3.0? 0.3
#2????? 1??? 2? 3.1? 0.3
#3????? 1??? 3? 3.4? 0.3
#4????? 2??? 1? 4.0? 0.1
#5????? 2??? 2? 4.3? 0.1
#6????? 2??? 3? 4.4? 0.1
#7????? 3??? 1? 3.0? 0.3
#8????? 3??? 2? 3.2? 0.3
#9????? 3??? 3? 3.5? 0.3
A.K.




----- Original Message -----
From: Julie Lee-Yaw <julleeyaw at yahoo.ca>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Monday, September 17, 2012 7:15 PM
Subject: [R] help with calculation from dataframe with multiple entries per sample

Hi?

I have a dataframe similar to:
? Sample Time Mass
1 ? ? ?1 ? ?1 ?3.0
2 ? ? ?1 ? ?2 ?3.1
3 ? ? ?1 ? ?3 ?3.4
4 ? ? ?2 ? ?1 ?4.0
5 ? ? ?2 ? ?2 ?4.3
6 ? ? ?2 ? ?3 ?4.4
7 ? ? ?3 ? ?1 ?3.0
8 ? ? ?3 ? ?2 ?3.2
9 ? ? ?3 ? ?3 ?3.5

where for each sample, I've measured mass at different points in time.?

I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:
? Sample Time MassGain2-3
1 ? ? ?1 ? ?1 ?3.00.3
2 ? ? ?1 ? ?2 ?3.1 0.3
3 ? ? ?1 ? ?3 ?3.4 0.3
4 ? ? ?2 ? ?1 ?4.0 0.1
5 ? ? ?2 ? ?2 ?4.3 0.1
6 ? ? ?2 ? ?3 ?4.4 0.1
7 ? ? ?3 ? ?1 ?3.0 0.3
8 ? ? ?3 ? ?2 ?3.2 0.3
9 ? ? ?3 ? ?3 ?3.5 0.3

Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.?

Thanks!
??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
HI,
Modified version of my earlier solution:
res1<-tapply(mydata$Mass,mydata$Sample,FUN=function(x) (x[3]-x[2]))
res2<-data.frame(Sample=names(res1),Gain2_3=res1)
?merge(mydata,res2)

#Sample Time Mass Gain2_3
#1????? 1??? 1? 3.0???? 0.3
#2????? 1??? 2? 3.1???? 0.3
#3????? 1??? 3? 3.4???? 0.3
#4????? 2??? 1? 4.0???? 0.1
#5????? 2??? 2? 4.3???? 0.1
#6????? 2??? 3? 4.4???? 0.1
#7????? 3??? 1? 3.0???? 0.3
#8????? 3??? 2? 3.2???? 0.3
#9????? 3??? 3? 3.5???? 0.3
A.K.



----- Original Message -----
From: Julie Lee-Yaw <julleeyaw at yahoo.ca>
To: "r-help at r-project.org" <r-help at r-project.org>
Cc: 
Sent: Monday, September 17, 2012 7:15 PM
Subject: [R] help with calculation from dataframe with multiple entries per sample

Hi?

I have a dataframe similar to:
? Sample Time Mass
1 ? ? ?1 ? ?1 ?3.0
2 ? ? ?1 ? ?2 ?3.1
3 ? ? ?1 ? ?3 ?3.4
4 ? ? ?2 ? ?1 ?4.0
5 ? ? ?2 ? ?2 ?4.3
6 ? ? ?2 ? ?3 ?4.4
7 ? ? ?3 ? ?1 ?3.0
8 ? ? ?3 ? ?2 ?3.2
9 ? ? ?3 ? ?3 ?3.5

where for each sample, I've measured mass at different points in time.?

I now want to calculate the difference between Mass at Time 2 and 3 for each unique Sample and store this as a new variable called "Gain2-3". So in my example three values of 0.3,0.1,0.3 would be calculated for my three unique samples and these values would be repeated in the table according to Sample. I am thus expecting:
? Sample Time MassGain2-3
1 ? ? ?1 ? ?1 ?3.00.3
2 ? ? ?1 ? ?2 ?3.1 0.3
3 ? ? ?1 ? ?3 ?3.4 0.3
4 ? ? ?2 ? ?1 ?4.0 0.1
5 ? ? ?2 ? ?2 ?4.3 0.1
6 ? ? ?2 ? ?3 ?4.4 0.1
7 ? ? ?3 ? ?1 ?3.0 0.3
8 ? ? ?3 ? ?2 ?3.2 0.3
9 ? ? ?3 ? ?3 ?3.5 0.3

Does anyone have any suggestions as to how to do this? I've looked at the various apply functions but I can't seem to make anything work. I'm fairly new to R and would appreciate specific suggestions.?

Thanks!
??? [[alternative HTML version deleted]]


______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
#
On Sep 17, 2012, at 7:28 PM, arun wrote:

            
That is going to fail as soon as there are an uneven number of rows for one value of Sample.
Error in `$<-.data.frame`(`*tmp*`, "Gain", value = c(0.3, 0.3, 0.3, 0.100000000000001,  : 
  replacement has 9 rows, data has 10
David Winsemius, MD
Alameda, CA, USA
#
The following works even when the input data frame has its rows
scrambled.  It does not currently check that there is exactly one entry
in each sample for Time==2 and Time==3.

within(mydata, 
            `Gain2-3` <- ave(seq_along(Sample),
                                          Sample,
                                          FUN=function(i) {
                                             L2 <- Time[i]==2
                                             L3 <- Time[i]==3
                                             Mass[i][L3] - Mass[i][L2] }))

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
#
Thanks everyone for the help! I pulled together a bunch of your suggestions
to get the result that I needed. I'm posting my final code below. Probably
not the most efficient way of doing things but gets the job done in a way
that a newbie can understand!

##Here again is the example dataset

Sample<-c(1,1,1,2,2,2,3,3,3)
Mass<-c(3,3.1,3.4,4,4.3,4.4,3,3.2,3.5)
Time<-c(1,2,3,1,2,3,1,2,3)
mydata<-as.data.frame(cbind(Sample,Time,Mass))

## I split the dataset by Sample and then calculate the difference between
mass at time 3 and mass at time 2 for each Sample; then use the merge
function to attach this data to my initial dataset

sp<-split(mydata,mydata$Sample)
y<-rbind(lapply(sp,function(x){Gain<-x$Mass[x$Time==3]-x$Mass[x$Time==2]}))

## note here that as I modification to some of the suggestions posted, I
wanted a way to specifically call "mass at time 3" etc. for each sample
rather than relying on the position of such data within each split/Sample
(hence allowing me to deal with samples that may have the Time/Mass data
input in a different order

# some massaging of the results
u<-t(y)
s<-data.frame(Sample=row.names(u),Gain2_3=u)
fulldata<-merge(mydata,s)

## as I wished to export the data in the end using write.csv, I had to
covert "list" data into "numeric" in the final dataframe

fulldata$Gain<-as.numeric(fulldata$Gain2_3) 
fulldata$Gain2_3<-NULL

Thanks again everyone!




--
View this message in context: http://r.789695.n4.nabble.com/help-with-calculation-from-dataframe-with-multiple-entries-per-sample-tp4643434p4643581.html
Sent from the R help mailing list archive at Nabble.com.