Newbie data organisation/structures question...
On Wed, 2006-12-20 at 16:05 +0000, Gav Wood wrote:
Howdo folks, So my data is in this sort of format: P T I 1 1 (1, 2, 3) 2 1 (2, 4) 1 2 (1, 3, 6, 7) 2 2 (6) And I want to be able to quickly get: 1: The I when both P and T are given. e.g.: P = 2, T = 2; I = (6) 2: The concatenated vector of Is when P and a subset of T is given, e.g.: P = 1, T = 1:2; Is = (1, 2, 3, 1, 3, 6, 7) 3: The length of that vector. It would also be nice to have: 4: A list of Is when either P or T is given. e.g.: P = 2: I = (2, 4), (6) T = 1: I = (1, 2, 3), (1, 3, 6, 7) Currently, I have a matrix of P x T, whose elements are lists of a single item, the vector I. I call this 'm'. (1) is easy; just m[P, T][[1]] (2) and (3) are apparently much harder. For 3, I'm resorting to: total <- 0 for(p in 1:length(m[,T])) total <- total + length(m[p,T][[1]]); And something simiThis then giveslar for 2. There must surely be a better way of doing this; but what is it? Cheers, Gav
Reading in your data using:
DF <- read.fwf("clipboard", widths = c(3, 3, 12),
skip = 1)
colnames(DF) <- c("P", "T", "I")
Substitute your actual data file name for 'clipboard' above.
Note that I skip the header row, as the "T" causes problems, since it
wants to be converted to 'TRUE' (logical, not char) upon import,
screwing up the column widths. I then assign the colnames post import.
This then gives me:
DF
P T I
1 1 1 (1, 2, 3)
2 2 1 (2, 4)
3 1 2 (1, 3, 6, 7)
4 2 2 (6)
Given the manipulations that you appear to want to do, I would first
strip the parens from "I" to make subsequent operations easier:
DF$I <- gsub("\\(|\\)", "", DF$I)
So:
DF
P T I 1 1 1 1, 2, 3 2 2 1 2, 4 3 1 2 1, 3, 6, 7 4 2 2 6 Now, split the character vector based DF$I into components and convert it to numeric lists:
DF$I <- lapply(strsplit(DF$I, split = ","), as.numeric)
DF
P T I 1 1 1 1, 2, 3 2 2 1 2, 4 3 1 2 1, 3, 6, 7 4 2 2 6 # Look at the structure of 'DF'
str(DF)
'data.frame': 4 obs. of 3 variables: $ P: num 1 2 1 2 $ T: num 1 1 2 2 $ I:List of 4 ..$ : num 1 2 3 ..$ : num 2 4 ..$ : num 1 3 6 7 ..$ : num 6 Now for your manipulations above: 1: The I when both P and T are given. e.g.: P = 2, T = 2; I = (6)
subset(DF, (P == 2) & (T == 2), select = I)
I 4 6 2: The concatenated vector of Is when P and a subset of T is given, e.g.: P = 1, T = 1:2; Is = (1, 2, 3, 1, 3, 6, 7)
unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I))
I1 I2 I3 I4 I5 I6 I7 1 2 3 1 3 6 7 or you can use:
as.vector(unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I)))
[1] 1 2 3 1 3 6 7 which strips the name attributes from the vector. 3: The length of that vector.
length(unlist(subset(DF, (P == 1) & (T %in% 1:2), select = I)))
[1] 7 4: A list of Is when either P or T is given. e.g.: P = 2: I = (2, 4), (6) T = 1: I = (1, 2, 3), (1, 3, 6, 7)
subset(DF, P == 2, select = I)
I 2 2, 4 4 6
subset(DF, T == 1, select = I)
I 1 1, 2, 3 2 2, 4 Note that your example above for 'T == 1' in 4 is incorrect based upon your example data. "(1, 3, 6, 7)" is on the row where T == 2. :-) See ?read.fwf, ?read.table, ?subset, ?split, ?gsub, ?lapply, ?unlist, ?Syntax and ?Comparison for more information. HTH, Marc Schwartz