An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111010/d335504b/attachment.pl>
help with statistics in R - how to measure the effect of users in groups
3 messages · gj, Bert Gunter, PIKAL Petr
An embedded and charset-unspecified text was scrubbed... Name: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20111010/df89879c/attachment.pl>
Hi
OK. So my original advice and warnings are correct. However, now there is an additional wrinkle because your response is a count, which is not a continuous measurement. For this, you'll need
glm(...,
family = "poisson") instead of lm(...), where the ... is the stuff I
gave
you before. A backup approach is there aren't too many small counts
(below
about 10, say) is to take the square root of the counts and analyze that
via
lm(). In either approach, your interpretation becomes more difficult -- e.g.
have
you any experience with glm's = generalized linear models? Moreover, if there are large numbers of users -- e.g. > dozens (and you may have
hundreds
or thousands -- of course the interaction will be significant, but so
what?
For this you'll need to re-frame the question. So given all this and what appears to be your relative ignorance of statistics, I strongly recommend that you get local statistical help. Or just forget about formal statistical analysis altogether and do some sensible plotting.
what was actually my advice too
library(ggplot2) p<-ggplot(test.m, aes(x=variable, y=value, colour=users)) p+geom_point()
Regards Petr
Finally, that's it for me on this. I will offer you no more advice. -- Bert On Mon, Oct 10, 2011 at 9:40 AM, gj <gawesh at gmail.com> wrote:
Hi Bert, The real situation is like what you suggested, user x group
interactions.
The users can be in more than one group. In fact, the data that I am trying to analyse constitute of users,
online
forums as groups and the attribute under measure is the number of
posts made
by each user in a particular forum. My hypothesis is that the number of posts a user makes to a forum is dependent on the forum. For example if the user is in a forum that is
active
he contributes more compared to when he is in a forum that is less
active. I
guess there will be some users who contribute the same irrespective of
the
forum. I hope this makes sense. Regards Gawesh On Mon, Oct 10, 2011 at 4:50 PM, Bert Gunter
<gunter.berton at gene.com>wrote:
Yes, of course. But then one gets into additional problems with
carryover
effects,etc. Also, one then has a repeated measures problem (User is the
experimental
unit) and my previous advice is nonsense, Like you, I have no idea what his real situation is. -- Bert On Mon, Oct 10, 2011 at 8:39 AM, Anupam <anupamtg at gmail.com> wrote:
It is possible to give multiple treatments, one at a time, to same
pool
of patients. You are correct that interactions may be important in
this
problem. I am only trying to help him frame the problem using an
analogy.
**** ** ** Anupam.**** *From:* Bert Gunter [mailto:gunter.berton at gene.com] *Sent:* Monday, October 10, 2011 8:21 PM *To:* Anupam *Cc:* gj *Subject:* Re: [R] help with statistics in R - how to measure the
effect
of users in groups**** ** ** If that is the case, and each user can appear in only one group,
there is
no group x user interaction, the poster's question was nonsense, and
one
analyzes the group effect only, as originally shown -- Bert**** On Mon, Oct 10, 2011 at 7:43 AM, Anupam <anupamtg at gmail.com>
wrote:****
Groups are different treatments given to Users for your Outcome (measurement) of interest. Take this idea forward and you will have
an
answer. Anupam. -----Original Message----- From: r-help-bounces at r-project.org [
mailto:r-help-bounces at r-project.org]
On Behalf Of Bert Gunter Sent: Monday, October 10, 2011 7:36 PM To: gj Cc: r-help at r-project.org Subject: Re: [R] help with statistics in R - how to measure the
effect of
users in groups Assuming your data are in a data frame, yourdat, as: User Group Value u1 1 !0 u2 2 5 u3 3 NA ...(etc) where Group is **explicitly coerced to be a factor,** then you want
the
User x Group interaction, obtained from lm( Value ~ Group*User,data = yourdat) However, you'll get some kind of warning message if a) Not all Group x User combinations are present in the data b) Moreover, no statistics can be calculated if there are no
replicates
of UserxGroup combinations. If you do not know why either of these are the case, get local help
or
study any linear models (regression) text or online tutorial, as these
last
issues have nothing to do with R. -- Bert On Mon, Oct 10, 2011 at 3:48 AM, gj <gawesh at gmail.com> wrote:
Thanks Petr. I will try it on the real data. But that will only show that the groups are different or not. Is there any way I can test if the users are different when they
are
in different groups? Regards Gawesh On Mon, Oct 10, 2011 at 11:17 AM, Petr PIKAL
<petr.pikal at precheza.cz>
wrote:
Hi Petr, It's not an equation. It's my mistake; the * are meant to be
field
separators for the example data. I should have just use blank spaces as follows: users Group1 Group2 Group3 u1 10 5 N/A u2 6 N/A 4 u3 5 2 3 Regards Gawesh
OK. You shall transform your data to long format to use lm
test <- read.table("clipboard", header=T, na.strings="N/A")
test.m<-melt(test)
Using users as id variables
fit<-lm(value~variable, data=test.m)
summary(fit)
Call:
lm(formula = value ~ variable, data = test.m)
Residuals:
1 2 3 4 6 8 9
3.0 -1.0 -2.0 1.5 -1.5 0.5 -0.5
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.000 1.258 5.563 0.00511 **
variableGroup2 -3.500 1.990 -1.759 0.15336
variableGroup3 -3.500 1.990 -1.759 0.15336
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 2.179 on 4 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.525, Adjusted R-squared: 0.2875
F-statistic: 2.211 on 2 and 4 DF, p-value: 0.2256
No difference among groups, but I am not sure if this is the
correct
way to evaluate. library(ggplot2) p<-ggplot(test.m, aes(x=variable, y=value, colour=users)) p+geom_point() There is some sign that user3 has lowest value in each group. However for including users to fit there is not enough data. Regards Petr
On Mon, Oct 10, 2011 at 9:32 AM, Petr PIKAL <petr.pikal at precheza.cz>
wrote:
Hi I do not understand much about your equations. I think you
shall
look
to
Practical Regression and Anova Using R from J.Faraway. Having data frame DF with columns - users, groups, results
you
could
do
fit <- lm(results~groups, data = DF) Regards Petr
Hi, I'm a newbie to R. My knowledge of statistics is mostly
self-taught.
My
problem is how to measure the effect of users in groups. I
can
calculate
a
particular attribute for a user in a group. But my
hypothesis
is
that
the
user's attribute is not independent of each other and that
the
user's
attribute depends on the group ie that user's behaviour
change
based
on
the
group. Let me give an example: users*Group 1*Group 2*Group 3 u1*10*5*n/a u2*6*n/a*4 u3*5*2*3 For example, I want to be able to prove that u1 behaviour
is
different
in
group 1 than other groups and the particular thing about
Group
1 is
that
users in Group 1 tend to have a higher value of the
attribute
under measurement. Hence, can use R to test my hypothesis. I'm willing to
learn;
so if
this
is
very simple, just point me in the direction of any online resources
about
it. At the moment, I don't even how to define these class
of
problems?
That
will be a start. Regards Gawesh [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained,
reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible
code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
****
** **
-- "Men by nature long to get on to the ultimate truths, and will often
be
impatient with elementary studies or fight shy of them. If it were
possible
to reach the ultimate truths without the elementary studies usually
prefixed
to them, these would not be preparatory studies but superfluous
diversions."
-- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
biostatistics/pdb-ncb-home.htm
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.