do glm with two data sets

3 messages · Hu, Ying (NIH/NCI), Sundar Dorai-Raj, Gavin Simpson

Thu, Aug 18, 2005 7:38 AM #

Thanks for your help.

# read the two data sets
e <- as.matrix(read.table("file1.txt", header=TRUE,row.names=1))
g <- as.matrix(read.table("file2.txt", header=TRUE,row.names=1))

# solution 
d1<-data.frame(g[1,], e[1,])
fit<-glm(e[1,] ~ g[1,], data=d1)
summary(fit)

I am not sure that is the best solution.

Thanks again,

Ying
 

-----Original Message-----
From: Gavin Simpson [mailto:gavin.simpson at ucl.ac.uk] 
Sent: Wednesday, August 17, 2005 7:01 PM
To: Sundar Dorai-Raj
Cc: Hu, Ying (NIH/NCI); r-help at stat.math.ethz.ch
Subject: Re: [R] do glm with two data sets

On Wed, 2005-08-17 at 17:22 -0500, Sundar Dorai-Raj wrote:

Hi Ying,

That error message is likely caused by having a data.frame on the right
hand side (rhs) of the formula. You can't have a data.frame on the rhs
of a formula and g1 is still a data frame even if you only choose the
first row, e.g.:

dat <- as.data.frame(matrix(100, 10, 10))
class(dat[1, ])
[1] "data.frame"

You could try:

glm(e1 ~ ., data=g1[1, ])

and see if that works, but as Sundar notes, your post is a little
difficult to follow, so this may not do what you were trying to achieve.

HTH

Gav

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

Sundar Dorai-Raj

Thu, Aug 18, 2005 7:47 AM #

Hu, Ying (NIH/NCI) wrote:

Hi, Ying,

What's wrong with this solution? Do you still get an error? What is your 
primary goal?

A couple of points:

1. It's better to use names in your data.frame:

d1 <- data.frame(g = g[1,], e = e[1,])

Then in glm:

fit <- glm(e ~ g, data = d1)

2. Also, you may just be giving us a toy example, but if you don't 
specify a family argument in glm then you are simply getting the least 
squares. In that case you should use ?lm instead.

HTH,

--sundar

-----Original Message-----
From: Gavin Simpson [mailto:gavin.simpson at ucl.ac.uk] 
Sent: Wednesday, August 17, 2005 7:01 PM
To: Sundar Dorai-Raj
Cc: Hu, Ying (NIH/NCI); r-help at stat.math.ethz.ch
Subject: Re: [R] do glm with two data sets

On Wed, 2005-08-17 at 17:22 -0500, Sundar Dorai-Raj wrote:

Hu, Ying (NIH/NCI) wrote:

I have two data sets:
File1.txt: 
Name id1   id2   id3   ...
N1    0     1     0     ...
N2    0     1     1     ...
N3    1     1     -1    ...
...

File2.txt:
Group id1       id2       id3       ...
G1       1.22     1.34     2.44     ...
G2       2.33     2.56     2.56     ...
G3       1.56     1.99     1.46     ...
...
I like to do:
x1<-c(0,1,0,...)
y1<-c(1.22,1.34, 2.44, ...)
z1<-data.frame(x,y)
summary(glm(y1~x1,data=z1)

But I do the same thing by inputting the data sets from the two files
e <- read.table("file1.txt", header=TRUE,row.names=1)
g <- read.table("file2.txt", header=TRUE,row.names=1)
e1<-exp[1,]
g1<-geno[1,]
d1<-data.frame(g, e)
summary(glm(e1 ~ g1, data=d1))

the error message is 
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames,  : 
       invalid variable type
Execution halted

Thanks in advance,

Ying


Hi Ying,

That error message is likely caused by having a data.frame on the right
hand side (rhs) of the formula. You can't have a data.frame on the rhs
of a formula and g1 is still a data frame even if you only choose the
first row, e.g.:

dat <- as.data.frame(matrix(100, 10, 10))
class(dat[1, ])
[1] "data.frame"

You could try:

glm(e1 ~ ., data=g1[1, ])

and see if that works, but as Sundar notes, your post is a little
difficult to follow, so this may not do what you were trying to achieve.

HTH

Gav

You have several inconsistencies in your example, so it will be 
difficult to figure out what you are trying to accomplish.

e <- read.table("file1.txt", header=TRUE,row.names=1)
g <- read.table("file2.txt", header=TRUE,row.names=1)
e1<-exp[1,]

What's "exp"? Also it's dangerous to use an R function as a variable 
name. Most of the time R can tell the difference, but in some cases it 
cannot.

g1<-geno[1,]

What's "geno"?

d1<-data.frame(g, e)

d1 is now e and g cbind'ed together?

summary(glm(e1 ~ g1, data=d1))

Are "e1" and "g1" elements of "d1"? From what you've told us, I don't 
know where the error is occurring. Also, if you are having errors, you 
can more easily isolate the problem by doing:

fit <- glm(e1 ~ g1, data = d1)
summary(fit)

This will at least tell you the problem is in your call to "glm" and not 
"summary.glm".

--sundar

P.S. Please (re-)read the POSTING GUIDE. Most of the time you will 
figure out problems such as these on your own during the process of 
creating a reproducible example.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

Gavin Simpson

Thu, Aug 18, 2005 8:00 AM #

On Thu, 2005-08-18 at 10:38 -0400, Hu, Ying (NIH/NCI) wrote:

This is redundant, as:

and:

fit <- glm(e[1, ] ~ g[1, ])

are equivalent - you don't need data = d1 in this case, e.g:

e <- matrix(c(0, 1, 0, 0, 1, 1, 1, 1, -1), ncol = 3, byrow = TRUE)
e
g <- matrix(c(1.22, 1.34, 2.44, 2.33, 2.56, 2.56, 1.56, 1.99, 1.46),
ncol = 3, byrow = TRUE)
g
fit <- glm(e[1, ] ~ g[1, ])
fit

works fine.

This seems a strange way of doing this. Why not:

pred <- g[1, ]
resp <- e[1, ]
fit <- glm(resp ~ pred)
fit

and do your subsetting outside the glm call - makes things clearer no?
Unless you plan to do many glm()s one per row of your two matrices. If
that is the case, then there are better ways of approaching this.

HTH

G

-----Original Message-----
From: Gavin Simpson [mailto:gavin.simpson at ucl.ac.uk] 
Sent: Wednesday, August 17, 2005 7:01 PM
To: Sundar Dorai-Raj
Cc: Hu, Ying (NIH/NCI); r-help at stat.math.ethz.ch
Subject: Re: [R] do glm with two data sets

On Wed, 2005-08-17 at 17:22 -0500, Sundar Dorai-Raj wrote:

Hu, Ying (NIH/NCI) wrote:

I have two data sets:
File1.txt: 
Name id1   id2   id3   ...
N1    0     1     0     ...
N2    0     1     1     ...
N3    1     1     -1    ...
...
 
File2.txt:
Group id1       id2       id3       ...
G1       1.22     1.34     2.44     ...
G2       2.33     2.56     2.56     ...
G3       1.56     1.99     1.46     ...
...
I like to do:
x1<-c(0,1,0,...)
y1<-c(1.22,1.34, 2.44, ...)
z1<-data.frame(x,y)
summary(glm(y1~x1,data=z1)
 
But I do the same thing by inputting the data sets from the two files
e <- read.table("file1.txt", header=TRUE,row.names=1)
g <- read.table("file2.txt", header=TRUE,row.names=1)
e1<-exp[1,]
g1<-geno[1,]
d1<-data.frame(g, e)
summary(glm(e1 ~ g1, data=d1))
 
the error message is 
Error in model.frame(formula, rownames, variables, varnames, extras,
extranames,  : 
        invalid variable type
Execution halted
 
Thanks in advance,
 
Ying

Hi Ying,

That error message is likely caused by having a data.frame on the right
hand side (rhs) of the formula. You can't have a data.frame on the rhs
of a formula and g1 is still a data frame even if you only choose the
first row, e.g.:

dat <- as.data.frame(matrix(100, 10, 10))
class(dat[1, ])
[1] "data.frame"

You could try:

glm(e1 ~ ., data=g1[1, ])

and see if that works, but as Sundar notes, your post is a little
difficult to follow, so this may not do what you were trying to achieve.

HTH

Gav

You have several inconsistencies in your example, so it will be 
difficult to figure out what you are trying to accomplish.

 > e <- read.table("file1.txt", header=TRUE,row.names=1)
 > g <- read.table("file2.txt", header=TRUE,row.names=1)
 > e1<-exp[1,]

What's "exp"? Also it's dangerous to use an R function as a variable 
name. Most of the time R can tell the difference, but in some cases it 
cannot.

 > g1<-geno[1,]

What's "geno"?

 > d1<-data.frame(g, e)

d1 is now e and g cbind'ed together?

 > summary(glm(e1 ~ g1, data=d1))

Are "e1" and "g1" elements of "d1"? From what you've told us, I don't 
know where the error is occurring. Also, if you are having errors, you 
can more easily isolate the problem by doing:

fit <- glm(e1 ~ g1, data = d1)
summary(fit)

This will at least tell you the problem is in your call to "glm" and not 
"summary.glm".

--sundar

P.S. Please (re-)read the POSTING GUIDE. Most of the time you will 
figure out problems such as these on your own during the process of 
creating a reproducible example.

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!

http://www.R-project.org/posting-guide.html

%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [T] +44 (0)20 7679 5522
ENSIS Research Fellow             [F] +44 (0)20 7679 7565
ENSIS Ltd. & ECRC                 [E] gavin.simpsonATNOSPAMucl.ac.uk
UCL Department of Geography       [W] http://www.ucl.ac.uk/~ucfagls/cv/
26 Bedford Way                    [W] http://www.ucl.ac.uk/~ucfagls/
London.  WC1H 0AP.
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%