Skip to content

help building dataset

2 messages · walcotteric, R. Michael Weylandt

#
I'm having trouble building a dataset. I'm working with Census data from
Brazil, and the particular set I'm trying to get into right now is a
microdata sample which has 4 data files that are saved at .txt files full of
numbers. The folder also has  lot of excel sheets and other text files
describing the data, and (I'm assuming) to help organize everything.  
Unfortunately there isn't much help in the description about how to
construct the dataset and avoid messing things up (since its Census data, I
need to make sure I avoid associating data with the wrong city/state, etc.).  

I basically just need to be able to put the data in readable format, because
there's literally 1 variable in the set that I can't find anywhere else and
need to get at for some analysis. However, when I've tried to get the data
straight into R (copy from NotePad), it overloads R, and R stops responding. 

Any suggestions? Or, if there isn't enough information about the set to be
helpful, what else do you need to know?
Or if you'd like to take a look at the data let me know and I can attach it.

Thanks!





--
View this message in context: http://r.789695.n4.nabble.com/help-building-dataset-tp4637491.html
Sent from the R help mailing list archive at Nabble.com.
#
Weren't you told to take a look at read.table() (both the function
help and the manual describing it)?

If the rows correspond in each data file, something like

do.call(cbind, lapply(dir(), read.table))

will possibly align the results of read.table()-ing each file in your
directory.

To parse that further:

dir() gives a list of all files in the directory.

lapply( x, FUN) takes a set of values (x) and does FUN to them. Here
it would read.table() on each file name.

do.call(cbind, x) will call the cbind() function on the results of
lapply(). It's sort of like doing cbind(x[1], x[2], x[3], ...) but
doesn't require as much typing or you to know how many columns are
going in.

Michael
On Mon, Jul 23, 2012 at 1:32 PM, walcotteric <walcott3 at msu.edu> wrote: