Hello:
I have been give a spreadsheet to work with formed as one big table.
What it consists of is a 10-row-by-40-column table for each of about 70
different locations. In other words, the table row names are repeated
70 times, once for each of the locations (whose names also appear in the
same column, where it's talking about the totals for that location), e.g.:
A B C
Location1 15 73 123 <- this row is the sum of the following 3
Under 10 6 42 23
10 - 25 4 15 23
Over 25 5 16 77
Location2 18 75 113 <- same here
Under 10 7 45 13
10 - 25 5 18 44
Over 25 6 12 56
I want to get this into R as a collection of data frames, one for each
of my locations. My questions:
1. There is a way to handle a collection of data frames, isn't there?
No doubt there are plenty, but what's the easiest way, so that I
can address them collectively, allowing me to ask such questions as
what's the max of the over 25's in column C?
2. What's the easiest way to read such a data array from a text file?
I can do some editing of a csv file produced from the spreadsheet,
but don't really know what to aim for.
3. Is there some shortcut that would allow me to read this directly
from a spreadsheet?
Many thanks,
-tom
Hello:
I have been give a spreadsheet to work with formed as one big table.
What it consists of is a 10-row-by-40-column table for each of
about 70
different locations. In other words, the table row names are repeated
70 times, once for each of the locations (whose names also appear
in the
same column, where it's talking about the totals for that
location), e.g.:
A B C
Location1 15 73 123 <- this row is the sum of the following 3
Under 10 6 42 23
10 - 25 4 15 23
Over 25 5 16 77
Location2 18 75 113 <- same here
Under 10 7 45 13
10 - 25 5 18 44
Over 25 6 12 56
I want to get this into R as a collection of data frames, one for each
of my locations. My questions:
1. There is a way to handle a collection of data frames, isn't
there?
No doubt there are plenty, but what's the easiest way, so that I
can address them collectively, allowing me to ask such
questions as
what's the max of the over 25's in column C?
A list is the best way for that. Then you can use things like lapply
and sapply, as I do towards the end of the script that follows.
2. What's the easiest way to read such a data array from a text
file?
I can do some editing of a csv file produced from the
spreadsheet,
but don't really know what to aim for.
Here is the code I used to read your example, which I saved as a
comma-separated file, with the only addition that I added the name
"Names" to the first column. You will probably need to adjust
filename, nlocations and rows.per.location.
filename <- "~/Desktop/rows.txt"
nlocations <- 2
rows.per.location <- 3
data <- read.csv(filename)
data$Names <- gsub("\\s","", data$Names, perl=TRUE) # Trim off
whitespace from first column
totals <- data[4*seq_len(nlocations)-3,] # Pick up the
rows with the totals
actual.data <- data[-(4*seq_len(nlocations)-3),] # Pick up the rest
location.names <- totals[,1] # The location
names are now the first column of totals
data.by.location <- split(actual.data, rep(location.names,
each=rows.per.location)) # this is the "workhorse"
data.by.location <- lapply(data.by.location, function(x) {
data.frame(x[,-1], row.names=x[,1]) # Converting
each list item to a better form
})
totals2 <- sapply(data.by.location, function(x) sapply(x,sum))
all(totals2 == t(totals[,-1])) # Should return true if the totals
add up
3. Is there some shortcut that would allow me to read this directly
from a spreadsheet?
Have a look at the xlsReadWrite package.
Many thanks,
-tom
Haris Skiadas
Department of Mathematics and Computer Science
Hanover College
2. What's the easiest way to read such a data array from a text
file?
I can do some editing of a csv file produced from the
spreadsheet,
but don't really know what to aim for.
Here is the code I used to read your example, which I saved as a
comma-separated file, with the only addition that I added the name
"Names" to the first column. You will probably need to adjust
filename, nlocations and rows.per.location.
Thank you very much. This is more than I'd hoped for.
-tom
Hello all:
I'm sorry to take up bandwidth with easy questions, but as an R
beginner, I continue to be surprised by factors. If someone can shed
some light for me, I'd be grateful:
z02807 z02812 z02809 z02804 z02806 z02801 z02813 z02802 Total z02808
103 112 1328 169 2368 29 501 51 54813 76
507 Levels: 0 10 1001 1004 1008 1016 1027 1028 103 1031 10318 1043 1045 ... Na
Apparently this factor is sorted in alphabetic order, not numeric order.
I find no parameter of sort() that controls for this.
And yet:
mode(d)
[1] "numeric"
length(d)
[1] 10
I would have thought that because d is a numeric list of ten values, I'd
get them sorted in numeric order.
Can someone help me understand why this is expected behavior, and also
what I should do in order to see the numerically sorted list of values I
really want?
Many thanks,
-tom
By default, factors are characters and sorted in alphabetical order.
It looks like somehow you numeric data was converted to factors.
Therefore '10' comes before '2'. If you want the factors in numeric
order you have to convert them back. Look in the FAQs.
sort(as.numeric(as.character(x.f))) # 'expected' numeric order
[1] 1 2 3 4 5 10 11 20 21 22 30
On Dec 21, 2007 10:34 AM, tom sgouros <tomfool at as220.org> wrote:
Hello all:
I'm sorry to take up bandwidth with easy questions, but as an R
beginner, I continue to be surprised by factors. If someone can shed
some light for me, I'd be grateful:
z02807 z02812 z02809 z02804 z02806 z02801 z02813 z02802 Total z02808
103 112 1328 169 2368 29 501 51 54813 76
507 Levels: 0 10 1001 1004 1008 1016 1027 1028 103 1031 10318 1043 1045 ... Na
Apparently this factor is sorted in alphabetic order, not numeric order.
I find no parameter of sort() that controls for this.
And yet:
mode(d)
[1] "numeric"
length(d)
[1] 10
I would have thought that because d is a numeric list of ten values, I'd
get them sorted in numeric order.
Can someone help me understand why this is expected behavior, and also
what I should do in order to see the numerically sorted list of values I
really want?
Many thanks,
-tom
--
------------------------
tomfool at as220 dot org
http://sgouros.comhttp://whatcheer.net