Skip to content

Need help on dataframe

7 messages · Simonas Kecorius, John Kane, David L Carlson +2 more

#
HI,

May be this helps:
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2 
2??? 3??? 2??? 2??? 1? 
3??? 6??? 5??? 3??? 2 
4??? 12? 15? 3??? 2? 
5??? 6??? 8??? 3??? 2 
6??? 3??? 2??? 4??? 1? 
7??? 6??? 5??? 3??? 3 
8??? 12? 15? 3??? 1? 
9??? 6??? 5??? 3??? 3 
10??? 3??? 2??? 7??? 5? 
11??? 6??? 5??? 8??? 2 
12??? 12? 19? 3??? 2? 
13??? 6??? 5??? 3??? 2 
14??? 3??? 4??? 2??? 1? 
15??? 6??? 5??? 6??? 2 
16??? 12? 15? 5??? 2? 
17??? 6??? 5??? 5??? 2 
18??? 3??? 2??? 8??? 1? 
19??? 6??? 5??? 3??? 9 
20??? 12? 15? 3??? 10? 
21??? 6??? 5??? 3??? 2 
22??? 3??? 2??? 2??? 11? 
23??? 6??? 5??? 3??? 4 
24??? 12? 15? 9??? 2 
",sep="",header=TRUE,stringsAsFactors=FALSE) 

dat1$newID<-rep(1:(nrow(dat1)/12),each=12) #if nrow(dat1)/12 is integer

?with(dat1,aggregate(cbind(V1,V2,V3,V4),by=list(newID),mean))
#? Group.1?? V1?????? V2?????? V3?????? V4
#1?????? 1 6.75 7.333333 3.750000 2.166667
#2?????? 2 6.75 6.916667 4.333333 4.000000

#or
aggregate(.~newID,data=dat1[,-1],mean)
#? newID?? V1?????? V2?????? V3?????? V4
#1???? 1 6.75 7.333333 3.750000 2.166667
#2???? 2 6.75 6.916667 4.333333 4.000000


A.K.



----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!
#
Hi,

One more way:
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2
2??? 3??? 2??? 2??? 1 
3??? 6??? 5??? 3??? 2
4??? 12? 15? 3??? 2 
5??? 6??? 8??? 3??? 2
6??? 3??? 2??? 4??? 1 
7??? 6??? 5??? 3??? 3
8??? 12? 15? 3??? 1 
9??? 6??? 5??? 3??? 3
10??? 3??? 2??? 7??? 5 
11??? 6??? 5??? 8??? 2
12??? 12? 19? 3??? 2 
13??? 6??? 5??? 3??? 2
14??? 3??? 4??? 2??? 1 
15??? 6??? 5??? 6??? 2
16??? 12? 15? 5??? 2 
17??? 6??? 5??? 5??? 2
18??? 3??? 2??? 8??? 1 
19??? 6??? 5??? 3??? 9
20??? 12? 15? 3??? 10 
21??? 6??? 5??? 3??? 2
22??? 3??? 2??? 2??? 11 
23??? 6??? 5??? 3??? 4
24??? 12? 15? 9??? 2
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-aggregate(.~1:nrow(dat1)%/%13,data=dat1[,-1],mean)
?names(res)[1]<-"group"
?res
#? group?? V1?????? V2?????? V3?????? V4
#1???? 0 6.75 7.333333 3.750000 2.166667
#2???? 1 6.75 6.916667 4.333333 4.000000
A.K.



----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!
#
Well, a rather simple-minded, brute force approach would be to add a factor variable to the data frame and use aggregate on it.

I am sure there are better ways but this will work.

EXAMPLE
###
xx  <-  data.frame(aa =1:24, 
                     b = matrix(sample(c(1,2,3,4,5,6), 72,  replace = TRUE), nrow= 24))
  dd  <-rep(c("a","b"), each= 12)
  
  xx  <-  cbind(dd, xx)
  
  aggregate(xx[,3:5], list(xx$dd), mean)

################

By the way, when supplying data samples a good way is to use the dput command. Try ?dput for information
John Kane
Kingston ON Canada
____________________________________________________________
FREE ONLINE PHOTOSHARING - Share your photos online with your friends and family!
Visit http://www.inbox.com/photosharing to find out more!
#
This is a slight modification of John's approach using 6 variables and 28
observations:

set.seed(42)
xx  <-  data.frame(aa = 1:28, matrix(sample(1:6, 6*28,  
    replace = TRUE), nrow= 28))
dd  <- ((1:nrow(xx)-1) %/% 7) +1
result <- aggregate(xx[,-1], by=list(dd), FUN=mean)[dd,-1]
result <- data.frame(aa=xx$aa, result)
row.names(result) <- row.names(xx)

----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
#
HI,
Sorry, there was a mistake, which I noticed after seeing David's post.
dat1<-read.table(text="
ID? V1? V2? V3? V4
1??? 6??? 5??? 3??? 2
2??? 3??? 2??? 2??? 1
3??? 6??? 5??? 3??? 2
4??? 12? 15? 3??? 2
5??? 6??? 8??? 3??? 2
6??? 3??? 2??? 4??? 1
7??? 6??? 5??? 3??? 3
8??? 12? 15? 3??? 1
9??? 6??? 5??? 3??? 3
10??? 3??? 2??? 7??? 5
11??? 6??? 5??? 8??? 2
12??? 12? 19? 3??? 2
13??? 6??? 5??? 3??? 2
14??? 3??? 4??? 2??? 1
15??? 6??? 5??? 6??? 2
16??? 12? 15? 5??? 2
17??? 6??? 5??? 5??? 2
18??? 3??? 2??? 8??? 1
19??? 6??? 5??? 3??? 9
20??? 12? 15? 3??? 10
21??? 6??? 5??? 3??? 2
22??? 3??? 2??? 2??? 11
23??? 6??? 5??? 3??? 4
24??? 12? 15? 9??? 2
25??? 6??? 5??? 3??? 2
26??? 3??? 2??? 2??? 1
27?? 6??? 5??? 3??? 2
28??? 12? 15? 3??? 2
29??? 6??? 8??? 3??? 2
30??? 3??? 2??? 4??? 1
31??? 6??? 5??? 3??? 3
32??? 12? 15? 3??? 1
33??? 6??? 5??? 3??? 3
34??? 3??? 2??? 7??? 5
35??? 6??? 5??? 8??? 2
36??? 12? 19? 3??? 2
37??? 6??? 5??? 3??? 2
38??? 3??? 4??? 2??? 1
",sep="",header=TRUE,stringsAsFactors=FALSE)
res<-aggregate(.~(1:nrow(dat1)-1)%/%12,data=dat1[,-1],mean)
? names(res)[1]<-"group"
A.K.





----- Original Message -----
From: Simonas Kecorius <simolas2008 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Saturday, January 5, 2013 8:33 AM
Subject: [R] Need help on dataframe

Dear R users, I came up to a problem by taking means (or other summary
statistics) of a big dataframe.

Suppose we do have a dataframe:

ID? V1? V2? V3? V4 ........................ V71
1? ? 6? ?  5? ? 3? ?  2? ........................? 3
2? ? 3? ?  2? ? 2? ?  1? ........................? 1
3? ? 6? ?  5? ? 3? ?  2? ........................? 3
4? ? 12?  15? 3? ?  2? ........................? 100
........................................................
........................................................
288 10? 20? 30?  30 .......................... 499

I need to find out the way, how to calculate a mean of every 12 lines to
get:

V1? ? ? ? ? ? ? ? ? ? ? ? ? ? ? V2? ? ? ? ? ? ? ? V3? ? ? ? ? ? ? ?  V4
........................... V71
mean from 1 to 7? ? ?  same as V1? ? same as V1
mean from 8 to 14? ?  same as V1? ? same as V1
etc.

I can do it column by column using:

y.ts <- ts(y$V1, frequency=12)
aggregate(y.ts, FUN=mean)

Bu this is a hardcore... Can anyone suggest a better way to compute all the
dataframe at once and get a result as matrix?

Thank you in advance!