Skip to content

how to show percentage of individuals for two groups on histogram?

7 messages · Ana Marija, Jim Lemon, Eric Berger

#
Hello,

I have a data frame like this:
FID   IID FLASER PLASER DIABDUR HBA1C ESRD   pheno
1 fam1000-03 G1000      1      1      38  10.2    1 control
2 fam1001-03 G1001      1      1      15   7.3    1 control
3 fam1003-03 G1003      1      2      17   7.0    1    case
4 fam1005-03 G1005      1      1      36   7.7    1 control
5 fam1009-03 G1009      1      1      23   7.6    1 control
6 fam1052-03 G1052      1      1      32   7.3    1 control
[1] 1698    8

I am doing histogram plot via:
ggplot(a, aes(x=HBA1C, fill=pheno)) + geom_histogram(binwidth=.5,
position="dodge")

there is 848 who have "case" in pheno column and 892 who have
"control" in pheno column.

I would like to have on y-axis shown percentage of individuals which
have either "case" or "control" in pheno instead of count.

Please advise,
Ana
#
the result would basically look something like this on in attach or
the overlay of those two plots
On Thu, May 21, 2020 at 5:23 PM Ana Marija <sokovic.anamarija at gmail.com> wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-05-21 at 5.49.37 PM.png
Type: image/png
Size: 52888 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200521/740baf08/attachment.png>
#
Hi Ana,
My apologies for the pedestrian graphics, but it may help.

# a bit of fake data
aafd<-data.frame(FID=paste0("fam",1000:2739),
 IID=paste0("G",1000,2739),FLASER=rep(1,1740),
 PLASER=c(rep(1,892),rep(2,848)),
 DIABDUR=sample(10:50,1740,TRUE),
 HBAIC=rnorm(1740,mean=7.45,sd=2),ESRD=rep(1,1740),
 pheno=c(rep("control",892),rep("case",848)))
par(mfrow=c(2,1))
casepct<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
controlpct<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))
par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casepct),col="orange",
 space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
 "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlpct),
 space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
 "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()

Jim
On Fri, May 22, 2020 at 9:08 AM Ana Marija <sokovic.anamarija at gmail.com> wrote:
#
Hi Ana,
Just noticed a typo from a hasty cut-paste. Two lines should read:

casehist<-table(cut(aafd$HBAIC[aafd$pheno=="case"],breaks=0:15))
controlhist<-table(cut(aafd$HBAIC[aafd$pheno=="control"],breaks=0:15))

Jim
On Fri, May 22, 2020 at 2:08 PM Jim Lemon <drjimlemon at gmail.com> wrote:
#
Hi Ana,
This is a very common question about ggplot.
A quick search turns up lots of hits that answer your question. Here
are a couple
https://community.rstudio.com/t/trouble-scaling-y-axis-to-percentages-from-counts/42999
https://stackoverflow.com/questions/3695497/show-instead-of-counts-in-charts-of-categorical-variables
ggplot(a, aes(x = HBA1C, fill=pheno)) + geom_histogram(aes(y =
stat(density)), binwidth = 0.5) +
      scale_y_continuous(labels = scales::percent_format())

HTH,
Eric
On Fri, May 22, 2020 at 7:18 AM Jim Lemon <drjimlemon at gmail.com> wrote:
#
HI Jim,

Thank you so much for getting back to me I tried your codes and I got
this in attach,
I think the issue is in calculating percentage per groups (cases or controls)

par(mfrow=c(2,1))
casehist<-table(cut(a$HBA1C[a$pheno=="case"],breaks=0:15))
controlhist<-table(cut(a$HBA1C[a$pheno=="control"],breaks=0:15))

par(mar=c(0,4,1,2))
barpos=barplot(100*casehist,names.arg=names(casehist),col="orange",
               space=0,ylab="Percentage",xaxt="n",ylim=c(0,25))
text(mean(barpos),23,
     "Cases: n=848, nulls=26, median=7.3, mean=7.45, sd=1.96")
box()
par(mar=c(3,4,0,2))
barplot(100*controlhist,names.arg=names(controlhist),
        space=0,ylab="Percentage",col="orange",ylim=c(0,25))
text(mean(barpos),23,
     "Controls: n=892, nulls=7, median=7.3, mean=7.45, sd=1.12")
box()

I can send you the whole dataset if you would like to try with it
On Thu, May 21, 2020 at 11:14 PM Jim Lemon <drjimlemon at gmail.com> wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-05-22 at 9.42.01 AM.png
Type: image/png
Size: 88187 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200522/bb7b7549/attachment.png>
#
Hi Eric,

Thank you for getting back to me, I tried those solutions but they
don't do percentage per groups, so if I do
ggplot(data=subset(a, !is.na(pheno)), aes(x=HBA1C, fill=pheno)) +
geom_histogram(aes(y =

stat(density)), binwidth = 0.5) +
  scale_y_continuous(labels = scales::percent_format())

I am getting the plot in attach, while my results should be more in
this range like on the plot here:
https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/variable.cgi?study_id=phs000018.v2.p1&phv=19980&phd=154&pha=2864&pht=62&phvf=&phdf=&phaf=&phtf=&dssp=1&consent=&temp=1
On Fri, May 22, 2020 at 12:18 AM Eric Berger <ericjberger at gmail.com> wrote:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2020-05-22 at 9.42.21 AM.png
Type: image/png
Size: 57079 bytes
Desc: not available
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20200522/ad5e1814/attachment.png>