An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050531/7d1100f2/attachment.pl
Loading matrices and other things
2 messages · Mike Schuler, Gabor Grothendieck
On 5/31/05, Mike Schuler <schulerm at bc.edu> wrote:
Hi all,
I'm new to R, so needless to say I have a couple questions (which I hope
I haven't missed through the documentation).
I have several files in lower triangular matrix form. For each of these
matrices, I want to perform some form of hierarchical clustering on each
matrix and capture the output of the clustering.
The first problem I run into is actually loading the matrix file into R.
I've attempted using the read.table function but to no avail. What is
the best way to read in a matrix?
Note: matrices are in a form like so, a space between each value,
then a newline There is also a diagonal of 0's stripped out. (Matrices
are the output of RNAdistance if that's helpful)
Let's say its stored in a file called 'rtest'
21
34 55
55 34 21
27 10 61 44
59 42 25 8 40
61 44 27 10 34 6
73 64 57 48 66 44 50
78 69 62 53 71 49 55 5
77 68 103 94 70 94 96 88 89
77 68 103 94 70 90 96 84 85 10
31 24 53 46 30 50 52 72 73 74 74
Second, I've searched through the web and it seems hclust
<http://www.maths.lth.se/help/R/.R/library/mva/html/hclust.html> is the
appropriate function From what I can tell from here
<http://stat.ethz.ch/R-manual/R-devel/library/stats/html/dist.html> the
above matrix should be a valid format (even without the 0s), but
confirmation would be nice. And with hclust, does this produce a tree
with the output, or would that be the plclust function? I haven't been
able to experiment with this because of my inability to do accomplish
the previous question.
Here is something to try:
# get number of entries and read in
n <- max(count.fields("myfile.dat")) + 1
x <- scan("myfile.dat")
# create matrix from x
x.mat <- matrix(0,n,n)
x.mat[upper.tri(x.mat)] <- x
x.mat <- x.mat + t(x.mat)
# convert to distance matrix
x.dist <- as.dist(x.mat)
# run hclust
x.hclust <- hclust(x.dist)
# plot
plot(x.hclust, cex = 0.6)
rect.hclust(x.hclust,k=5,border="red")
And last, I want to be able to run R on many different files of the same matrix type. Is it possible to write a (Python) script run through the appropriate tasks and save the visual output as a postscript file?
You don't need another language. It can all be done from R. Suppose
we want to read in each .dat file in the current directory, plot it and
save the plot:
for (f in dir(patt = "[.]dat$")) { x <- read.table(f); plot(x);
savePlot(f, "ps") }
savePlot, used above, is specific to Windows. See ?dev.print
if you are not on Windows.