Skip to content

creating series of vectors

7 messages · Petr Savicky, ilai, MacQueen, Don +1 more

#
Dear All,

I am pretty new to R and thus my question may sound silly.

Is there a way to automatically generate a series of separate vectors 
(so not arranged in a matrix), without typing and changing every time 
the values, and store them as separate *xlsx file, where the "*" is 
replaced by the name of the vector itself?

What i would like to create is a total of 12 vectors, corresponding to 
the 12 months (January to December), say for the year 2006; thus the 
name of a resulting single vector should be something like 
"January2006", and the final file that will be stored in my WD should 
have the same name ("January2009.xlsx").

The number of the elements of each vector must correspond to the length 
in days of the single months (considering a non-leap-year, 356 days) 
multiplied by 2 (e.g. "January2006" will have 31*2=62 elements, 
"February2006" will have 28*2=56 elements, and so on).

Finally, the elements of the vectors should be named as: 
"010106_aaa","010106_bbb","020106_aaa","020106_bbb", ... , 
"310106_aaa","310106_bbb".

To sum up, at the end of the process i would like to obtain 12 vectors 
as it follows:

Jauary2006("010106_aaa","010106_bbb","020106_aaa","020106_bbb", ... , 
"310106_aaa","310106_bbb")
.
.
.
.
.
December2006("010106_aaa","010106_bbb","020106_aaa","020106_bbb", ... , 
"310106_aaa","310106_bbb")

Any help would be particularly welcome and appreciated.
Cheers,

NP

  * Italiano - rilevata
  * Inglese
  * Italiano
  * Francese
  * Spagnolo
  * Tedesco

  * Inglese
  * Italiano
  * Francese
  * Spagnolo
  * Tedesco

  <javascript:void(0);>
#
On Thu, Feb 16, 2012 at 05:32:15PM +0100, Nino Pierantonio wrote:
Hi.

Try the following function, which creates a list of vectors.

  seqDays <- function(year)
  {
      n <- 365 + (year %% 4 == 0)
      x <- as.Date(paste(year, "-01-01", sep="")) + 0:(n-1)
      months <- unique(months(x))
      x <- do.call(rbind, strsplit(as.character(x), "-"))
      x1 <- sprintf("%02d", year %% 100)
      y <- paste(x[, 3], x[, 2], x1, sep="")
      y <- c(rbind(paste(y, "_aaa", sep=""), paste(y, "_bbb", sep="")))
      x2 <- rep(x[, 2], each=2)
      out <- split(y, x2)
      names(out) <- paste(months, year, sep="")
      out
  }
 
  out <- seqDays(2006)
  out

  $January2006
   [1] "010106_aaa" "010106_bbb" "020106_aaa" "020106_bbb" "030106_aaa"
   [6] "030106_bbb" "040106_aaa" "040106_bbb" "050106_aaa" "050106_bbb"
  [11] "060106_aaa" "060106_bbb" "070106_aaa" "070106_bbb" "080106_aaa"
  [16] "080106_bbb" "090106_aaa" "090106_bbb" "100106_aaa" "100106_bbb"
  [21] "110106_aaa" "110106_bbb" "120106_aaa" "120106_bbb" "130106_aaa"
  [26] "130106_bbb" "140106_aaa" "140106_bbb" "150106_aaa" "150106_bbb"
  [31] "160106_aaa" "160106_bbb" "170106_aaa" "170106_bbb" "180106_aaa"
  [36] "180106_bbb" "190106_aaa" "190106_bbb" "200106_aaa" "200106_bbb"
  [41] "210106_aaa" "210106_bbb" "220106_aaa" "220106_bbb" "230106_aaa"
  [46] "230106_bbb" "240106_aaa" "240106_bbb" "250106_aaa" "250106_bbb"
  [51] "260106_aaa" "260106_bbb" "270106_aaa" "270106_bbb" "280106_aaa"
  [56] "280106_bbb" "290106_aaa" "290106_bbb" "300106_aaa" "300106_bbb"
  [61] "310106_aaa" "310106_bbb"
  
  $February2006
   [1] "010206_aaa" "010206_bbb" "020206_aaa" "020206_bbb" "030206_aaa"
   [6] "030206_bbb" "040206_aaa" "040206_bbb" "050206_aaa" "050206_bbb"
  [11] "060206_aaa" "060206_bbb" "070206_aaa" "070206_bbb" "080206_aaa"
  [16] "080206_bbb" "090206_aaa" "090206_bbb" "100206_aaa" "100206_bbb"
  [21] "110206_aaa" "110206_bbb" "120206_aaa" "120206_bbb" "130206_aaa"
  [26] "130206_bbb" "140206_aaa" "140206_bbb" "150206_aaa" "150206_bbb"
  [31] "160206_aaa" "160206_bbb" "170206_aaa" "170206_bbb" "180206_aaa"
  [36] "180206_bbb" "190206_aaa" "190206_bbb" "200206_aaa" "200206_bbb"
  [41] "210206_aaa" "210206_bbb" "220206_aaa" "220206_bbb" "230206_aaa"
  [46] "230206_bbb" "240206_aaa" "240206_bbb" "250206_aaa" "250206_bbb"
  [51] "260206_aaa" "260206_bbb" "270206_aaa" "270206_bbb" "280206_aaa"
  [56] "280206_bbb"
  
  $March2006
   [1] "010306_aaa" "010306_bbb" "020306_aaa" "020306_bbb" "030306_aaa"
  ...

Individual vectors may be accessed as out[[i]], their names
as names(out).

Storing to text files may be done as follows.

  for (i in 1:12) {
      writeLines(out[[i]], con=paste(names(out)[i], ".txt", sep=""))
  }

Hope this helps.

Petr Savicky.
#
# All days in years 2006 to 2009 by month in 48 (12x4) files.

days <- seq(as.Date("2006/1/1"), as.Date("2009/12/31"),by="day") # one
long vector
out <- paste(rep(format(days,'%d%m%y'),each=2),c('aaa','bbb'),sep='_')
# reformat to style
month <- factor(rep(format(days,'%B%y'),each=2))   # group by month.year
for(i in levels(month))
cat(out[month==i],'\n',file=paste(i,'txt',sep='.'))  # write external
files

Cheers
On Thu, Feb 16, 2012 at 9:32 AM, Nino Pierantonio <nino.p.80 at gmail.com> wrote:
4 days later
#
Thanks Ilai this helped.
Cheers,

Nino

Il 16/02/2012 23:15, ilai ha scritto:

  
    
#
Dear all,

I am using R to work on huge numbers of telemetry data divided by day. 
Each file (an xlsx file) contains 2 rows, the first one for sst readings 
and the second one for chl readings, and 72360 columns, each 
corresponding to the centre of a cell in my study area. The columns have 
no headings. Lots of cells have fake readings (-999.0000000). What I 
want to do is merging the files together, by month and season, replace 
null values with "NA" and then calculate for both sst and chl average 
row values. I have stored the files in the directory C:/TEMP. This 
directory contains 12 subfolders, January to December and each subfolder 
contains a certain number of files, corresponding to the number of days 
for each month (e.g. January 31 files, February 30 files, and so on).

I already have commands that work properly but would really know if it 
is possible to reduce their number and, maybe to do some of them 
automatically. What I do is working "month-by-month" as it follows (I am 
aware this is not the most elegant way to do it, i'm new to R and for 
the moment "elegance&stile" is not my main goal):

 >setwd("C:/Temp/January09")	# to set my working directory
 >library(xlsx)	# to load the "xlsx" library necessary to handle the 
original *.xlsx files
 >list.jan09<-list.files("C:/Temp/January09", full=TRUE)
 >read.all.jan09<-lapply(list.jan09, read.xlsx, 1, header=FALSE)
 >daily.all.jan09<-do.call("cbind",read.all.jan09)	# to create a data 
frame containig all my data
 >daily.sst.jan09<-daily.all.jan09[,seq(from=1,to=61,by=2)]	# to create 
a second data frame containing only sst readings (sst readings 
correspond to the first column of each daily file). The resulting file 
will have 31 columns and 72360 lines
 >daily.chl.jan09<-daily.all.jan09[,seq(from=2,to=62,by=2)]	# to create 
a third data frame containing only chl readings (chl readings correspond 
to the second column of each daily file). The resulting file will have 
31 columns and 72360 lines	
 >daily.sst.jan09<-replace(daily.sst.jan09,daily.sst.jan09==-999.0000000,NA)	# used to replace -999.0000000 values with "NA" 		
 >jan09_avgsst<-rowMeans(daily.sst.jan09)	# to create a vector 
containing the mean sst value of all the rows		
 >write.xlsx(jan09_avgsst, 
"C:/Users/AAA/Desktop/Data/january09_avgsst.xlsx")	# to store the sst 
vector		
 >daily.chl.jan09<-replace(daily.chl.jan09,daily.chl.jan09==-999.0000000,NA)	# used to replace -999.0000000 values with "NA"		
 >jan09_avgchl<-rowMeans(daily.chl.jan09)	# to create a vector 
containing the mean value of all the rows			
 >write.xlsx(jan09_avgchl, 
"C:/Users/AAA/Desktop/Data/january09_avgchl.xlsx")	# to store the chl 
vector	

I repeat these same commands for all the months	and for the seasons 
(January-March; April-June; July-September; October-December), so the 
all thing is a bit redundant.

How can I speed up the process, reduce the commands and maybe make them 
automatically? Many thanks for your help.

Cheers,
Nino
1 day later
#
Are you absolutely certain that the data must be stored in Excel?

In the long run I believe you will find it easier if the data is stored in
an external database, or some other data repository that does not require
you to read so many separate files.

Probably the best you can hope for as it is now is to put these commands
inside a loop, or nested loops, with the input and output file names
constructed from the loop indexes [see help('paste') for constructing file
names].

-Don
#
Thanks Don for your suggestion. I have received the original data in 
Excel xlsx format so I must work with them unless I want to change file 
format to thousands of files... I am also saving my output R files in 
Excel format to make them compatible with the original ones. Everything 
will be then stored in a proper database for further analysis after some 
basic data management.

Nino

I 23/02/2012 00:49, MacQueen, Don ha scritto:
* Italiano - rilevata
  * Inglese
  * Italiano
  * Francese
  * Spagnolo
  * Tedesco

  * Inglese
  * Italiano
  * Francese
  * Spagnolo
  * Tedesco

  <javascript:void(0);>
Impossibile tradurre il testo selezionato