Prev 327269 / 398502 Next

Some days missing using xtabs

Rui Barradas

Tue, Jul 23, 2013 5:40 AM

Hello,

Something I've just noticed, stringsAsFactors is not an argument to merge().

And, without changing the class I g a warning:

Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1:31) :
   invalid factor level, NA generated


Rui Barradas

Em 23-07-2013 13:36, arun escreveu:

Hi,

I tried this without the changing the class, but there was no warning.

  str(release_freq)
#'data.frame':    62 obs. of  4 variables:
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ Freq     : num  0 0 0 0 1 1 1 0 0 1 ...
  str(temp_h12)
#'data.frame':    31 obs. of  4 variables:
# $ y_temp: int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_temp: int  5 5 5 5 5 5 5 5 5 5 ...
# $ d_temp: int  1 2 3 4 5 6 7 8 9 10 ...
# $ temp  : num  16.9 18 17.4 19.7 105.7 ...


res<-merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)

   head(res)
  # y_release m_release d_release Freq temp
#1      2004         5         1    0 16.9
#2      2004         5        10    1 16.1
#3      2004         5        11    1 15.8
#4      2004         5        12    1 15.1
#5      2004         5        13    0 17.8
#6      2004         5        14    0 17.4

# changing the class
release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))
res1<- merge(release_freq, temp_h12,
by.x=c("y_release","m_release","d_release"),
by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)

head(res1)
#  y_release m_release d_release Freq temp
#1      2004         5         1    0 16.9
#2      2004         5        10    1 16.1
#3      2004         5        11    1 15.8
#4      2004         5        12    1 15.1
#5      2004         5        13    0 17.8
#6      2004         5        14    0 17.4

The results are not identical.
   identical(res,res1)
#[1] FALSE
str(res)
#'data.frame':    31 obs. of  5 variables:
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 10 11 12 13 14 15 16 17 18 ...
# $ Freq     : num  0 1 1 1 0 0 1 1 0 1 ...
# $ temp     : num  16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...
  str(res1)
#'data.frame':    31 obs. of  5 variables:
# $ y_release: int  2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_release: int  5 5 5 5 5 5 5 5 5 5 ...
# $ d_release: int  1 10 11 12 13 14 15 16 17 18 ...
# $ Freq     : num  0 1 1 1 0 0 1 1 0 1 ...
# $ temp     : num  16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...


sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8
  [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8
  [7] LC_PAPER=C                 LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] stringr_0.6.2  reshape2_1.2.2

loaded via a namespace (and not attached):
[1] plyr_1.8

A.K.

----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Stefano Sofia <stefano.sofia at regione.marche.it>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Tuesday, July 23, 2013 6:50 AM
Subject: Re: [R] Some days missing using xtabs

Hello,

As for your second question, before merge(), try the following.

release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))


And the warning is gone.

Hope this helps,

Rui Barradas

Em 23-07-2013 10:33, Stefano Sofia escreveu:

Dear R-users,
given the following data frame called hospital_2004

gender d_birth m_birth y_birth address d_admittance m_admittance y_admittance yard_admittance d_release m_release y_release yard_release diaprinc diasec1 diasec2 diasec3 diasec4 diasec5
2 13 12 1929 42002 30 3 2004 3003 6 5 2004 4902 430 4299 51881 4275 78001 0
1 1 8 1935 42002 7 4 2004 2401 18 5 2004 1801 20500 V581 0388 5849 0 0
1 23 12 1956 42018 26 4 2004 2402 31 5 2004 2402 1552 5715 7895 25000 4148 5722
1 9 8 1919 42002 05 5 2004 2602 22 5 2004 4902 51881 4254 4275 0 0 0
2 11 1 1925 52014 30 4 2004 2603 13 6 2004 4902 51881 49121 2732 4275 4299 5849
2 1 3 1963 44060 1 5 2004 5101 16 5 2004 2401 3201 1519 1976 1983 4019 0
1 6 3 1937 45010 6 5 2004 3003 12 5 2004 4901 431 3314 41189 25001 4019 V594
1 3 9 1931 42034 3 5 2004 5101 5 5 2004 5101 78559 4829 5119 1619 4241 585
2 13 9 1912 41007 5 5 2004 4901 7 5 2004 4901 85225 4019 42731 49121 0 0
1 21 10 1936 15146 7 5 2004 4901 10 5 2004 4901 431 430 V594 V595 0 0
2 8 5 1933 43044 8 5 2004 5802 8 6 2004 5802 5712 45620 2851 5119 5184 0
1 25 1 1926 41057 8 5 2004 4901 15 5 2004 4901 431 78001 49121 0 0 0
1 6 1 1923 42002 10 5 2004 1401 11 5 2004 4901 4440 412 4413 0 0 0
1 19 3 1934 42022 9 5 2004 1401 21 6 2004 4901 4413 5609 99811 4019 412 0
1 6 6 1921 43052 15 5 2004 4302 4 6 2004 4302 1890 20280 436 49121 9986 V1005

when I try to evaluate the frequency of daily releases through

release_freq <- as.data.frame(xtabs( ~ d_release + m_release + y_release, data=hospital_2004))

I get the following result:

d_release m_release y_release Freq
4         5      2004    0
5         5      2004    1
6         5      2004    1
7         5      2004    1
8         5      2004    0
10         5      2004    1
11         5      2004    1
12         5      2004    1
13         5      2004    0
15         5      2004    1
16         5      2004    1
18         5      2004    1
21         5      2004    0
22         5      2004    1
31         5      2004    1
4         6      2004    1
5         6      2004    0
6         6      2004    0
7         6      2004    0
8         6      2004    1
10         6      2004    0
11         6      2004    0
12         6      2004    0
13         6      2004    1
15         6      2004    0
16         6      2004    0
18         6      2004    0
21         6      2004    1
22         6      2004    0
31         6      2004    0

Why the 1st, 2nd, 3rd, 9th, 14th, 17th, 19th, 20th, from 23rd to 30th of both May and June are missing? (and there is the 31st of June?)

And a final question: why given another data frame called temp_h12

y_temp m_temp d_temp temp
2004 5 1 16.90
2004 5 2 18.00
2004 5 3 17.40
2004 5 4 19.70
2004 5 5 105.70
2004 5 6 17.30
2004 5 7 17.00
2004 5 8 16.20
2004 5 9 16.10
2004 5 10 16.10
2004 5 11 15.80
2004 5 12 15.10
2004 5 13 17.80
2004 5 14 17.40
2004 5 15 16.00
2004 5 16 17.70
2004 5 17 17.30
2004 5 18 22.30
2004 5 19 23.30
2004 5 20 24.30
2004 5 21 19.90
2004 5 22 15.70
2004 5 23 15.80
2004 5 24 17.10
2004 5 25 18.30
2004 5 26 21.00
2004 5 27 18.20
2004 5 28 17.90
2004 5 29 19.40
2004 5 30 22.10
2004 5 31 17.40

merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)

gives the following warning

Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1:31) :
     invalid factor level, NAs generated
?



Thank you for your help
Stefano Sofia

________________________________

AVVISO IMPORTANTE: Questo messaggio di posta elettronica pu? contenere informazioni confidenziali, pertanto ? destinato solo a persone autorizzate alla ricezione. I messaggi di posta elettronica per i client di Regione Marche possono contenere informazioni confidenziali e con privilegi legali. Se non si ? il destinatario specificato, non leggere, copiare, inoltrare o archiviare questo messaggio. Se si ? ricevuto questo messaggio per errore, inoltrarlo al mittente ed eliminarlo completamente dal sistema del proprio computer. Ai sensi dell'art. 6 della DGR n. 1394/2008 si segnala che, in caso di necessit? ed urgenza, la risposta al presente messaggio di posta elettronica pu? essere visionata da persone estranee al destinatario.
IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages to clients of Regione Marche may contain information that is confidential and legally privileged. Please do not read, copy, forward, or store this message unless you are an intended recipient of it. If you have received this message in error, please forward it to the sender and delete it completely from your computer system.

[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Thread (7 messages)

Stefano Sofia Some days missing using xtabs Jul 23 Pascal Oettli Some days missing using xtabs Jul 23 Rui Barradas Some days missing using xtabs Jul 23 Rui Barradas Some days missing using xtabs Jul 23 arun Some days missing using xtabs Jul 23 Rui Barradas Some days missing using xtabs Jul 23 Stefano Sofia R: Some days missing using xtabs Jul 23