Skip to content

query about counting rows of a dataframe

3 messages · Stefano Sofia, David Winsemius, Jean V Adams

#
Dear R users,
I have got the following data frame, called my_df:

   gender day_birth month_birth year_birth labour
1           F             22                  10           2001          1
2           M            29                  10           2001          2
3           M              1                   11          2001          1
4           F               3                  11           2001          1
5           M              3                  11           2001          2
6           F              4                   11           2001          1
7           F              4                   11           2001          2
8           F              5                   12           2001          2
9           M           22                   14           2001          2
10         F           29                   13           2001          2
...

I need to count data in different ways:

1. count the births for each day (having 0 when necessary) independently from the value of the "labour" column

2. count the births for each day (having 0 when necessary), divided by the value of "labour" (which can have two valuers, 1 or 2)

3. count the births for each day of all the years (i.e. the 22nd of October of all the years present in the data frame) independently from the value of "labour"

4. count the births for each day of all the years (i.e. the 22nd of October of all the years present in the data frame), divided by the value of "labour"

I tried with the command

table(my_df$year_birth, my_df$month_birth, my_df$day_birth)

which satisfies (partially) question numer 1 (I am not able to have 0 in the not available days).

Is there a smart way to do that without invoking too many loops?

thank you for your help
Stefano Sofia


AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell?art. 6 della  DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system.
#
On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:

            
xtabs sometimes give better results. If you want all 31 days then make  
day_birth a factor with levels=1:31)

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0
Cannot figure out what is being asked here. What to do with the two  
values? Just count them? This would give a partitioned count

 > xtabs( labour==1 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  0  0  0
        22  1  0  0  0  0
        29  0  0  0  0  0
 > xtabs( labour==2 ~ day_birth + month_birth , data=dat)
          month_birth
day_birth 10 11 12 13 14
        1   0  0  0  0  0
        3   0  1  0  0  0
        4   0  1  0  0  0
        5   0  0  1  0  0
        22  0  0  0  0  1
        29  1  0  0  1  0
If I understand correctly:

 > xtabs(  ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001

          month_birth
day_birth 10 11 12 13 14
        1   0  1  0  0  0
        3   0  2  0  0  0
        4   0  2  0  0  0
        5   0  0  1  0  0
        22  1  0  0  0  1
        29  1  0  0  1  0
Again confusing. Do you mean to use separate tables for labour==1 and  
labour==2? Perhaps context to explain what these values represent.  
Some of us are "concrete". The results of xtabs are tables and can be  
divided like matrices.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT