Skip to content

comparing 3 datasets

3 messages · Paul Johnson, Deepayan Sarkar, Brian Ripley

#
I have 3 datasets with the same variables.  I want to find out what
differences there are between the three, to know if an experimental
condition has an effect.  So I decided first to make histograms.  So I
created this handy "histomatic" function that creates a picture with the
3 histograms on a single image:

I thought I was being clever, but in the end, no!

#read in 3 tables worth
NoFlagMod0<-read.table("NoFlagMod0.txt",header=TRUE);
RandMastMod0<-read.table("RandMastMod0.txt",header=TRUE);
NoMastMod0<-read.table("NoMastMod0.txt",header=TRUE);

#here's my magical function
histomatic <- function (s1,s2,s3,var){
  if (is.numeric (s2[[var]])) {     
  par(mfrow=c(3,1));
  hist(s1[[var]],breaks=40,xlab=var);
  
  hist(s2[[var]], breaks=40,xlab=var);
  hist(s3[[var]], breaks=40,xlab=var);
  }
}
#cycle through all the variables, just grab names from first set.
nameList<-names(RandMastMod0);
par(ask=Yes)
for (var in nameList) histomatic(NoFlagMod0,RandMastMod0,NoMastMod0,var)

I knew I wanted a pretty fine grained display, so I set breaks at 40.
Other than that, I don't know for sure what else I want.

Here's the problem:
The histograms shown do not have the same ranges.  SInce the datasets
are slightly different, the ranges displayed are different, so they are
difficult to compare visually.  Is there a solution?

Other than that, if you have other ideas about comparing 3 datasets, i'm
glad to hear.  I'm especially curious to know if there is a significance
test of the hypothesis that 3 samples are drawn from a common
distribution. (apart from testing the means with an F test, that is).
#
--- pauljohn at ukans.edu wrote:
NoFlagMod0<-read.table("NoFlagMod0.txt",header=TRUE);
RandMastMod0<-read.table("RandMastMod0.txt",header=TRUE);
NoMastMod0<-read.table("NoMastMod0.txt",header=TRUE);
An xlim argument should set the same limits for all
the
histograms. Change your function to:


histomatic <- function (s1,s2,s3,var){
   if (is.numeric (s2[[var]])) {     
   par(mfrow=c(3,1));
   xlim <- range(s1[[var]], s2[[var]], s3[[var]]);
   hist(s1[[var]], breaks=40,xlab=var, xlim = xlim);
   hist(s2[[var]], breaks=40,xlab=var, xlim = xlim);
   hist(s3[[var]], breaks=40,xlab=var, xlim = xlim);
}


This should work.

(You might also consider using the lattice (package in
Devel section) function histogram().)


__________________________________________________
Do You Yahoo!?
Get personalized email addresses from Yahoo! Mail
http://personal.mail.yahoo.com/
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
#
On Tue, 19 Jun 2001, Deepayan Sarkar wrote:
[...]
Unfortunately not so.  The critical lines in hist.default are

        rx <- range(x)
        breaks <- pretty(rx, n = nnb, min.n = 1)

so the breaks depend on the range of the data and not on xlim.

Jon Baron had the better idea: use a grid for breaks.  So something like

rz <- range(s1[[var]], s2[[var]], s3[[var]])
breaks <- pretty(rz, 40)

...
hist(s1[[var]], breaks, xlab=var)
[...]

Another idea: use truehist() in package MASS, which wants the grid spacing,
not the number of breaks.