Create single vector after looping through multiple data frames with GREP
Hi Simon,
The function below should do it or at least get you started...
getPlotData <- function (datalist, response, times)
{
qdata <- sapply(datalist[times],
function(df) {
irow <- grepl(response, df$Response)
df[irow, 2:5]
}
)
# qdata is a matrix with rows Q1:Q4 and cols for times;
# we turn it into a two col matrix with col 1 = time index
# and col 2 = value
time.index <- seq(4 * ncol(qdata))
out <- cbind(time.index, as.numeric(qdata))
rownames(out) <- paste(time.index, rownames(qdata), sep=".")
colnames(out) <- c("time", response)
out
}
#Example, get data for times 10:15 where Response contains "Economy"
x <- getPlotData(r, "Economy", 10:15)
Michael
On 11 October 2010 03:35, Simon Kiss <sjkiss at gmail.com> wrote:
Hello all,
I changed the subject line of the e-mail, because the question I''m posing now is different than the first one. I hope that this is proper etiquette. ?However, the original chain is included below.
I've incorporated bits of ?both Ethan and Brian's code into the script below, but there's one aspect I can't get my head around. I'm totally new to programming with control structures. The reproducible code below creates a list containing 19 data frames, one each for the "Most Important Problem" ?survey data for Canada.
What I'd like at this stage is a loop where I can search through all the data frames for rows containing the search term and then bind the rows together in a plotable (sp?) format.
At the bottom of the code below, you'll find my first attempt to make use of a search string and to put it into a plotable format. ?It only partially works. ?I can only get the numbers for one year, where I'd like to be able to get a string of numbers for several years.But, on the upside, grep appears to do the trick in terms of selecting rows.
Can any one suggest a solution?
Yours truly,
Simon Kiss
#This is the reproducible code to set-up all the data frames
require("XML")
library(XML)
#This gets the data from the web and lists them
mylist <- paste ("http://www.queensu.ca/cora/_trends/mip_",
c(1987:2001,2003:2006), ".htm", sep="")
alltables <- lapply(mylist, readHTMLTable)
#convert to dataframes
r<-lapply(alltables, function(x) {as.data.frame(x)} )
#This is just some house-cleaning; structuring all the tables so they are uniform
r[[1]][3]<-r[[1]][2]
r[[1]][2]<-c(" ")
r[[2]][4]<-r[[2]][2]
r[[2]][5]<-r[[2]][3]
r[[2]][2:3]<-c(" ")
r[[3]][4:5]<-r[[3]][3:4]
r[[3]][3]<-c(" ")
#This loop deletes some superfluous columns and rows, turns the first column in to character strings and the data into numeric
for (i in 1:19) {
n.rows<-dim(r[[i]])[1]
r[[i]] <- r[[i]][15:n.rows-3, 1:5]
n.rows<-dim(r[[i]])[1]
row.names(r[[i]]) <-NULL
names(r[[i]]) <- c("Response", "Q1", "Q2", "Q3", "Q4")
r[[i]][, 1]<-as.character(r[[i]][,1])
#r[[i]][,2:5]<-as.numeric(as.character(r[[i]][,2:5]))
r[[i]][, 2:5]<-lapply(r[[i]][, 2:5], function(x) {as.numeric(as.character(x))})
#n.rows<-dim(r[[i]])[1]
#r[[i]]<-r[[i]][9
}
#This code is my first attempt at introducing a search string, getting the rows, binding and plotting;
economy<-r[[10]][grep('Economy', r[[10]][,1]),]
economy_2<-r[[11]][grep('Economy', r[[11]][,1]),]
test<-cbind(economy, economy_2)
plot(as.numeric(test), type='l')
#here's another attempt I'm trying....
economy<-data.frame
for (i in 15:19) {
economy[i,] <-r[[i]][grep('Economy', r[[i]][,1]), ]
}
Begin forwarded message:
From: Simon Kiss <sjkiss at gmail.com> Date: October 7, 2010 4:59:46 PM EDT To: Simon Kiss <simonjkiss at yahoo.ca> Subject: Fwd: [R] Converting scraped data Begin forwarded message:
From: Ethan Brown <ethancbrown at gmail.com> Date: October 6, 2010 4:22:41 PM GMT-04:00 To: Simon Kiss <sjkiss at gmail.com> Cc: r-help at r-project.org Subject: Re: [R] Converting scraped data Hi Simon, You'll notice the "test" data.frame has a whole mix of characters in the columns you're interested, including a "-" for missing values, and that the columns you're interested in are in fact factors. as.numeric(factor) returns the level of the factor, not the value of the level. (See ?levels and ?factor)--that's why it's giving you those irrelevant integers. I always end up using something like this handy code snippet to deal with the situation: unfactor <- function(factors) # From http://psychlab2.ucr.edu/rwiki/index.php/R_Code_Snippets#unfactor # Transform a factor back into its factor names { ?return(levels(factors)[factors]) } Then, to get your data to where you want it, I'd do this: require(XML) theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm" tables <- readHTMLTable(theurl) n.rows <- unlist(lapply(tables, function(t) dim(t)[1])) class(tables) test<-data.frame(tables, stringsAsFactors=FALSE) result <- test[11:42, 1:5] #Extract the actual data we want names(result) <- c("Response", "Q1", "Q2","Q3","Q4") for(i in 2:5) { # Convert columns to factors result[,i] <- as.numeric(unfactor(result[,i])) } result From here you should be able to plot or do whatever else you want. Hope this helps, Ethan Brown On Wed, Oct 6, 2010 at 9:52 AM, Simon Kiss <sjkiss at gmail.com> wrote:
Dear Colleagues,
I used this code to scrape data from the URL conatined within. ?This code
should be reproducible.
require("XML")
library(XML)
theurl <- "http://www.queensu.ca/cora/_trends/mip_2006.htm"
tables <- readHTMLTable(theurl)
n.rows <- unlist(lapply(tables, function(t) dim(t)[1]))
class(tables)
test<-data.frame(tables, stringsAsFactors=FALSE)
test[16,c(2:5)]
as.numeric(test[16,c(2:5)])
quartz()
plot(c(1:4), test[15, c(2:5)])
calling the values from the row of interest using test[16, c(2:5)] can bring
them up as represented on the screen, plotting them or coercing them to
numeric changes the values and in a way that doesn't make sense to me. My
intuitino is that there is something going on with the way the characters
are coded or classed when they're scraped into R. ?I've looked around the
help files for converting from character to numeric but can't find a
solution.
I also tried this:
as.numeric(as.character(test[16,c(2:5)] and that also changed the values
from what they originally were.
I'm grateful for any suggestions.
Yours, Simon Kiss
*********************************
Simon J. Kiss, PhD
Assistant Professor, Wilfrid Laurier University
73 George Street
Brantford, Ontario, Canada
N3T 2C9
Cell: +1 519 761 7606
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606
********************************* Simon J. Kiss, PhD Assistant Professor, Wilfrid Laurier University 73 George Street Brantford, Ontario, Canada N3T 2C9 Cell: +1 519 761 7606
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.