gsub with regular expression
On Fri, Jun 25, 2010 at 10:48 AM, Sebastian Kruk
<residuo.solow at gmail.com> wrote:
If I have a text with 7 words per line and I would like to put first and second word joined in a vector and the rest of words one per column in a matrix how can I do it? First 2 lines of my text file: "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido" "2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca" Results: Vector: 2008/12/31 12:23:31 2010/02/01 02:35:31 Matrix "numero" 343.233.233 "Rodeo" ? "Vaca" ? "Ruido" "palabra" 111.111.222 "abejorro" "Rodeo" "Vaca"
Here are two solutions. Both solutions are three statements long
(read in the data, display the vector, display the matrix). Replace
textConnection(text) with "myfile.dat", say, in each.
1. Here is a sub solution:
L <- readLines(textConnection(Lines))
sub("(\\S+ \\S+) .*", "\\1", L)
sub("\\S+ \\S+ ", "", L)
2. Here is a solution using zoo:
Lines <- "2008/12/31 12:23:31 numero 343.233.233 Rodeo Vaca Ruido
2010/02/01 02:35:31 palabra 111.111.222 abejorro Rodeo Vaca"
library(zoo)
z <- read.zoo(textConnection(Lines), index = 1:2,
FUN = function(x) paste(x[,1], x[,2]))
time(z) # the vector
coredata(z) # the matrix
Another possibility would be to convert to chron or POSIXct at the
same time as reading it in:
# chron
library(chron)
z <- read.zoo(textConnection(Lines), index = 1:2,
FUN = function(x) as.chron(paste(x[,1], x[,2]), format = "%Y/%m/%d %H:%M:%S"))
# POSIXct
z <- read.zoo(textConnection(Lines), index = 1:2,
FUN = function(x) as.POSIXct(paste(x[,1], x[,2]), format = "%Y/%m/%d
%H:%M:%S"))