Need help with text processing / string split
try this:
x <- read.table('/temp/tbl.txt', sep = ',', header = TRUE, as.is = TRUE)
# remove commas from the Cost column
x$Cost <- gsub(',', '', x$Cost)
# split the Cost
temp <- strsplit(x$Cost, "\\$") # "$" is special, so it is escaped
temp <- do.call(rbind, temp) # create a matrix
mode(temp) <- 'numeric' # convert to numeric
x$Cost1 <- temp[, 2]
x$Cost2 <- temp[, 3]
head(x)
Address Township Parcel
Sale.Date
2 10 PACER LN East Norriton 330006712005
Bnkrptcy-PP to6/29/2011
3 6 BALA AVE Lower Merion 400003292007
STAYED5/25/2011
4 109 STONY WAY, Condo 109 East Norriton 330008575662
Bnkrptcy-PP to6/29/2011
5 613 NORTHAMPTON RD East Norriton 330006103002
Postponed to5/25/2011
6 67 HIGH GATE LN Whitpain 660002716764
Pstpnd by CO to5/25/2011
7 236 Arundel Ave aka 236 Arundel Road Horsham 360000136008
For Sale5/25/2011
Costs Cost Cost1 Cost2
2 $173,933.60$2,410.28 $173933.60$2410.28 173933.60 2410.28
3 $264,640.36$168.00 $264640.36$168.00 264640.36 168.00
4 $70,029.04$1,483.59 $70029.04$1483.59 70029.04 1483.59
5 $254,873.19$1,772.62 $254873.19$1772.62 254873.19 1772.62
6 $404,507.59$1,947.90 $404507.59$1947.90 404507.59 1947.90
7 $252,472.27$1,034.51 $252472.27$1034.51 252472.27 1034.51
On Sun, May 15, 2011 at 3:50 PM, eric <ericstrom at aol.com> wrote:
I used screen scraping to extract some information and put it into a table called tbl. Now I want to modify the table a bit so the data can be more useful. Here's the code I used: library(XML) rm(list=ls()) url <- "http://webapp.montcopa.org/sherreal/salelist.asp?saledate=05/25/2011" tbl <-data.frame(readHTMLTable(url))[2:405, c(3,5,6,8,9)] names(tbl) <- c("Address", "Township", "Parcel", "Sale Date", "Costs") tbl is attached as txt for your convenience. Entries in the last column of the dataframe (tbl$Cost) appear as follows: $173,933.60$2,410.28 ?. http://r.789695.n4.nabble.com/file/n3524793/tbl.txt tbl.txt How do I: 1. Split the string 2. Have the two values show up as actual numbers that can be used 3. Put the numbers in two separate columns of the dataframe. In other words $173,933.60$2,410.28 would show up as 173933.60 in one column and 2410.28 would show up in a second column of tbl I tried using strsplit but I could not get it working properly. -- View this message in context: http://r.789695.n4.nabble.com/Need-help-with-text-processing-string-split-tp3524793p3524793.html Sent from the R help mailing list archive at Nabble.com.
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Data Munger Guru What is the problem that you are trying to solve?