problem with white space
Here is one way of doing it. I would suggest that you read in the
data with readLines and then combine into one single string so that
you can use substring on it. Since you did not provide provide
commented, minimal, self-contained, reproducible code, I will take a
guess at that your data looks like:
# create some test data -- might be read in the readLines
sdata <- sapply(1:10, function(x){ # 10 lines of strings with 50 characters
paste(sample(LETTERS, 50, TRUE), collapse='')
})
# put into one large string so you can do substring on it
sdata <- paste(sdata, collapse='')
# now create 10 sample of size 20 and write in files (file1, file2, ... file10)
for (i in 1:10){
x <- sample(nchar(sdata), 20)
writeLines(paste(substring(sdata, x, x), collapse=''),
con=paste("file", i, sep=''))
}
On Sun, Mar 30, 2008 at 3:41 PM, Suraaga Kulkarni
<suraaga.kulkarni at gmail.com> wrote:
Hi,
I need to resample characters from a dataset that consists of an extremely
long string that is written over hundreds of thousands of lines, each of
length 50 characters. I am currently doing this by first inserting a space
after each character in the dataset and then using the following commands:
y <- as.matrix(read.table("data.txt"), stringsAsFactors=FALSE)
bstrap <- sample(length(y), 100000, TRUE)
write(y[bstrap], file="Rep1.txt", ncolumns=50, append=FALSE)
bstrap <- sample(length(y), 100000, TRUE)
write(y[bstrap], file="Rep2.txt", ncolumns=50, append=FALSE)
bstrap <- sample(length(y), 100000, TRUE)
.
.
.
and so on for 500 reps.
I think there should be a better way of doing this. My specific questions:
1. Is there a way to avoid inserting spaces between the characters before
calling the "sample" command (because I don't want spaces between the
resampled characters in the output either; see number 2 below)?
2. If I have no choice but to insert the spaces in my data before
resampling, is there a way to output the resampled data without spaces, but
simply as 50-character long strings one below the other)? I tried inserting
the following command: strip.white=TRUE in the write command line, but it
gave me an error as it did not understand the command.
3. Finally, since I have to get 500 such resampled reps from each dataset
(and there are over 20 such huge datasets) is there a way around having to
write a separate write command for each rep?
Any suggestions will be greatly appreciated.
Thanks,
S.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?