An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050413/136febec/attachment.pl
data manipulation
4 messages · Yoko Nakajima, John Fox, Marc Schwartz
Dear Yoko,
If you're sure that the data are complete, then data <-
matrix(scan("file-name"), ncol=29) should do the trick. Then to name the
columns of the data matrix, colnames(data) <- c("one", "two", etc.). [Of
course, you'd substitute meaningful names.]
I hope this helps,
John
--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
-----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Yoko Nakajima Sent: Wednesday, April 13, 2005 7:56 PM To: r-help at stat.math.ethz.ch Subject: [R] data manipulation Hello, my question is about the data handling. I have a data set that is lined as: 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 This means that 29 variables are together as a set. You saw two sets of them in example. I have about 1000 sets (of 29 variables) in my data. When I "scan" this data set, the result comes with 7 columns and it is not possible, so far, to read the table by column wise, and thus it is not possible to analyze the data. I would like to know whether there is a way to solve this problem, say, by arranging columns or increasing the number of columns of data matrix by R. Also, I would like to know how you could name each column of the data so that you could use the individual column separately. Sincerely. [[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Wed, 2005-04-13 at 20:56 -0400, Yoko Nakajima wrote:
Hello, my question is about the data handling. I have a data set that is lined as: 4 1 17 1 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 4 1 17 2 1 -5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678 -0.5081 -0.2227 0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673 -0.1033 -0.0796 -0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611 This means that 29 variables are together as a set. You saw two sets of them in example. I have about 1000 sets (of 29 variables) in my data. When I "scan" this data set, the result comes with 7 columns and it is not possible, so far, to read the table by column wise, and thus it is not possible to analyze the data. I would like to know whether there is a way to solve this problem, say, by arranging columns or increasing the number of columns of data matrix by R. Also, I would like to know how you could name each column of the data so that you could use the individual column separately.
You probably change some default setting in scan(). By default it treats 'white space' as field delimiters. Using your data above, which I save in file called 'test.dat':
mat <- matrix(scan("test.dat"), ncol = 29)
Read 58 items
dim(mat)
[1] 2 29
colnames(mat) <- paste("Col", 1:29, sep = "")
mat
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
[1,] 4 17 1.0000 -0.1668 -0.5062 0.3640 -0.5081 0.8142 -0.0445
[2,] 1 1 -5.1536 -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578
Col10 Col11 Col12 Col13 Col14 Col15 Col16 Col17 Col18
[1,] -0.1175 0.8673 -0.0796 -0.1716 -0.7014 0.5611 1 2 -5.1536
[2,] -0.1232 -0.1033 -0.0341 -0.1801 0.6578 4.0000 17 1 -0.1668
Col19 Col20 Col21 Col22 Col23 Col24 Col25 Col26
[1,] -2.3412 0.9621 0.3678 -0.2227 -0.0389 -0.0578 -0.1232 -0.1033
[2,] -0.5062 0.3640 -0.5081 0.8142 -0.0445 -0.1175 0.8673 -0.0796
Col27 Col28 Col29
[1,] -0.0341 -0.1801 0.6578
[2,] -0.1716 -0.7014 0.5611
In this case, 'mat' is a matrix with 2 rows and 29 columns.
You can restructure this differently as per your requirements.
HTH,
Marc Schwartz
9 days later
Hello,
may I ask a further question?
I have realized that "data <-
matrix(scan("file-name"), ncol=29)" will read the data differently than I
thought, i.e., (4,1) is the first column, (17,1) is the second column, and
(1,1) is the third and so on by this code - please see the data below.
Therefore, the data set I have would not be in order if I used this code.
It needed to be read as: (4.4) first column, (1,1) the second column, and
(17, 17) is the third and so on (i.e., from 4 to 0.5611 makes the first row
and another 4 to 0.5611 makes the second row and so on). So,
V1 V2 V3 ... V29
4 1 17 ... 0.5611
4 1 17 ... 0.5611
was needed.
(Now I have ,
V1 V2 V3 .... V29
4 17 1 ... 0.6578
1 1 -5.1536 ... 0.5611)
[The data set I have may have around 1000 sets of them (29 variables times
around 1000 sets of these 29 variables). I only paste here two sets of
them.]
4 1 17 1 1
-5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611
4 1 17 2 1
-5.1536 -0.1668 -2.3412 -0.5062 0.9621 0.3640 0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232 0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014 0.6578 0.5611
I need 29 columns. This is true. But the data was read differently by
"ncol=29". Is there any way I can handle this problem by R?
I would very appreciate it if you could let me know. My guess is that I
should probably rearrange the data set by excel etc.. I have used
"data.entry(data)" and found this. I can not analyze this data set.
Thank you very much, in advance.
Sincerely,
Yoko.