Skip to content

boxplots of 1 datum AND comparing rank and boolean

4 messages · Dan Kelley, Martin Maechler, Rashid Nassar

#
Q: When R does 'plot()' in a context that yields boxplots, is there a
way to force it to draw something even if there are only 1 or two data
in the category?  I'd like for it to draw the data, perhaps using the
outlier symbols.  My code is (*** marks the line in question) is the
following, for R-1.0.0:

	d <- read.table("nserc-results-pgsb", header=FALSE, 
                         col.names=c("name","dept","rank","accept"))
	# These data look like:
	#   First.Student   Some.Department     1  1
	#   Second.Student  Another.Department  2  1
	#   Third.Student   Another.Department  3  0
	attach(d)
	rank.inv <- 1/rank
        ll <- lm(accept ~ rank.inv + dept, data=d)
	print(summary(ll))
	print(anova(ll))
	plot(dept,resid(ll))	# makes boxplots ***

Actually, if anybody has a bright idea how I should analyse such data,
I'd love to hear it.  As you can see in the above, I transformed to
1/rank since our committee recorded high 'rank' values for students we
favoured.  It's not clear to me how to compare rankings to boolean
(accept/deny) results, so the 'lm()' above might be silly.

Thanks in advance for any advice.  This group is so generous, it
amazes me.

PS: just because I think it's fun to read what sort of work folks are
doing, the above is work I'm doing in trying to analyze the patterns
in the granting of scholarships by NSERC, the science granting agency
in Canada.  I chair a committee at my university that ranks
postgraduate students and sends the files to NSERC.  While NSERC
nearly obeys our rankings, it seems to me that favour some
departments.  I'd like to test that (hence "accept ~ rank.inv + dept"
in the above).

Dan E. Kelley                   internet:   mailto:Dan.Kelley at Dal.CA
Oceanography Department         phone:                 (902)494-1694
Dalhousie University            fax:                   (902)494-2885
Halifax, NS, CANADA, B3H 4J1    http://www.phys.ocean.dal.ca/~kelley

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
Dan> Q: When R does 'plot()' in a context that yields boxplots, is there a
    Dan> way to force it to draw something even if there are only 1 or two data
    Dan> in the category?  I'd like for it to draw the data, perhaps using the
    Dan> outlier symbols.  My code is (*** marks the line in question) is the
    Dan> following, for R-1.0.0:

    Dan> d <- read.table("nserc-results-pgsb", header=FALSE, 
    Dan>                 col.names=c("name","dept","rank","accept"))
    Dan> # These data look like:
    Dan> #   First.Student   Some.Department     1  1
    Dan> #   Second.Student  Another.Department  2  1
    Dan> #   Third.Student   Another.Department  3  0
but contain more than just three observations, right ?

    Dan> attach(d)
    Dan> rank.inv <- 1/rank
    Dan> ll <- lm(accept ~ rank.inv + dept, data=d)
    Dan> print(summary(ll))
    Dan> print(anova(ll))
    Dan> plot(dept,resid(ll))	# makes boxplots ***

    Dan> Actually, if anybody has a bright idea how I should analyse such data,
    Dan> I'd love to hear it.  As you can see in the above, I transformed to
    Dan> 1/rank since our committee recorded high 'rank' values for students we
    Dan> favoured.  It's not clear to me how to compare rankings to boolean
    Dan> (accept/deny) results, so the 'lm()' above might be silly.

I have misunderstood you completely..
Problem is I cannot repeat your example, since you didn't use "public" data.
(Usually, you'd construct data, something like
	 d <- data.frame(accept = rbinom(100, size=1, pr = .4),
	                 rank = sample(1:100),
			 dept = gl(5, 20))
)
Are you discussing the boxplots that are produced with only 1 or 2
observations per group?

Here are boxplots for n=1, 2, 3, and 4 obs. per group.
What's wrong with these ?

   do.call("boxplot", lapply(1:4,seq))
   title("Boxplot()s of very few points")

*Or* are you suggesting that for n=1, n=2 (and maybe n=3) per group
        plot(factor, continuous)
shouldn't use boxplot()s but rather dot plots ?
This is a suggestion that I've heard and had myself before,
very well worth discussing.

- How should the decision  boxplot / dotplot be made, just depend on n?
  Wouldn't one want the box + the single observations, e.g. when in
  one group n = 3, but in all other groups n ~= 20 (which would make
					     boxplots there in any case)?
- (When) should jittering be used ?

Regards,
Martin Maechler <maechler at stat.math.ethz.ch>	http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO D10	Leonhardstr. 27
ETH (Federal Inst. Technology)	8092 Zurich	SWITZERLAND
phone: x-41-1-632-3408		fax: ...-1228			<><
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Wed, 15 Mar 2000, Martin Maechler wrote:
First, thanks very much for your informative suggestion about creating
example data.  I'm not good enough at R yet to have thought of that,
and your email taught me some new things, which I really appreciate.
(I didn't enclose the actual data because they relate to confidential
scholarship evaluations.)
Yes, this is what was on my mind.  I'm not sure how the decision
should be made to do boxplots versus dotplots, but maybe that could be
an optional flag?
 
Again, thanks for the comments.  Dan.

Dan E. Kelley                   internet:   mailto:Dan.Kelley at Dal.CA
Oceanography Department         phone:                 (902)494-1694
Dalhousie University            fax:                   (902)494-2885
Halifax, NS, CANADA, B3H 4J1    http://www.phys.ocean.dal.ca/~kelley


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
When I needed a plot like what I think you want, I have used one of:

   plot(as.numeric(X),y)  where X is a factor or
   plot.default(X,y) 



best regards,

Rashid

(Rashid Nassar)
On Wed, 15 Mar 2000, Dan E. Kelley wrote:

            
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._