data manipulation

An embedded and charset-unspecified text was scrubbed...
Name: not available
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050413/136febec/attachment.pl
Dear Yoko,

If you're sure that the data are complete, then data <-
matrix(scan("file-name"), ncol=29) should do the trick. Then to name the
columns of the data matrix, colnames(data) <- c("one", "two", etc.). [Of
course, you'd substitute meaningful names.]

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
--------------------------------
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch 
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Yoko Nakajima
Sent: Wednesday, April 13, 2005 7:56 PM
To: r-help at stat.math.ethz.ch
Subject: [R] data manipulation

Hello,
my question is about the data handling.

I have a data set that is lined as:

4 1 17 1 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 
-0.5081 -0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 
-0.1033 -0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
4 1 17 2 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 
-0.5081 -0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 
-0.1033 -0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

This means that 29 variables are together as a set. You saw 
two sets of them in example. I have about 1000 sets (of 29 
variables) in my data. When I "scan" this data set, the 
result comes with 7 columns and it is not possible, so far, 
to read the table by column wise, and thus it is not possible 
to analyze the data. I would like to know whether there is a 
way to solve this problem, say, by arranging columns or 
increasing the number of columns of data matrix by R.

Also, I would like to know how you could name each column of 
the data so that you could use the individual column separately.

Sincerely.
	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html
Hello,
my question is about the data handling.

I have a data set that is lined as:

4 1 17 1 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081
-0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033
-0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611
4 1 17 2 1
 -5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678 -0.5081
-0.2227
  0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673 -0.1033
-0.0796
 -0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

This means that 29 variables are together as a set. You saw two sets
of them in example. I have about 1000 sets (of 29 variables) in my
data. When I "scan" this data set, the result comes with 7 columns and
it is not possible, so far, to read the table by column wise, and thus
it is not possible to analyze the data. I would like to know whether
there is a way to solve this problem, say, by arranging columns or
increasing the number of columns of data matrix by R.

Also, I would like to know how you could name each column of the data
so that you could use the individual column separately.
You probably change some default setting in scan(). By default it treats
'white space' as field delimiters.

Using your data above, which I save in file called 'test.dat':
mat <- matrix(scan("test.dat"), ncol = 29)
Read 58 items
dim(mat)
[1]  2 29
colnames(mat) <- paste("Col", 1:29, sep = "")
mat
Col1 Col2    Col3    Col4    Col5   Col6    Col7    Col8    Col9
[1,]    4   17  1.0000 -0.1668 -0.5062 0.3640 -0.5081  0.8142 -0.0445
[2,]    1    1 -5.1536 -2.3412  0.9621 0.3678 -0.2227 -0.0389 -0.0578
       Col10   Col11   Col12   Col13   Col14  Col15 Col16 Col17   Col18
[1,] -0.1175  0.8673 -0.0796 -0.1716 -0.7014 0.5611     1     2 -5.1536
[2,] -0.1232 -0.1033 -0.0341 -0.1801  0.6578 4.0000    17     1 -0.1668
       Col19  Col20   Col21   Col22   Col23   Col24   Col25   Col26
[1,] -2.3412 0.9621  0.3678 -0.2227 -0.0389 -0.0578 -0.1232 -0.1033
[2,] -0.5062 0.3640 -0.5081  0.8142 -0.0445 -0.1175  0.8673 -0.0796
       Col27   Col28  Col29
[1,] -0.0341 -0.1801 0.6578
[2,] -0.1716 -0.7014 0.5611

In this case, 'mat' is a matrix with 2 rows and 29 columns.

You can restructure this differently as per your requirements.

HTH,

Marc Schwartz
Hello,

may I ask a further question?

I have realized that "data <-
matrix(scan("file-name"), ncol=29)" will read the data differently than I
thought, i.e., (4,1) is the first column,  (17,1) is the second column, and
(1,1) is the third and so on by this code - please see the data below.
Therefore, the data set I have would not be in order if I used this code.

It needed to be read as: (4.4) first column, (1,1) the second column, and
(17, 17) is the third and so on (i.e., from 4 to 0.5611 makes the first row
and another 4 to 0.5611 makes the second row and so on). So,

V1 V2 V3 ...     V29
4    1    17   ...  0.5611
4    1    17   ...  0.5611

was needed.

(Now I have ,
V1 V2 V3  ....         V29
4    17   1           ...  0.6578
1    1   -5.1536  ...   0.5611)

[The data set I have may have around 1000 sets of them (29 variables times
around 1000 sets of these 29 variables). I only paste here two sets of
them.]
4 1 17 1 1
-5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

4 1 17 2 1
-5.1536 -0.1668 -2.3412 -0.5062  0.9621  0.3640  0.3678
-0.5081 -0.2227
0.8142 -0.0389 -0.0445 -0.0578 -0.1175 -0.1232  0.8673
-0.1033 -0.0796
-0.0341 -0.1716 -0.1801 -0.7014  0.6578  0.5611

I need 29 columns. This is true. But the data was read differently by
"ncol=29". Is there any way I can handle this problem by R?

I would very appreciate it if you could let me know. My guess is that I
should probably rearrange the data set  by excel etc.. I have used
"data.entry(data)" and found this. I can not analyze this data set.

Thank you very much, in advance.
Sincerely,
Yoko.