Hi,
I tried this without the changing the class, but there was no warning.
str(release_freq)
#'data.frame': 62 obs. of 4 variables:
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ Freq : num 0 0 0 0 1 1 1 0 0 1 ...
str(temp_h12)
#'data.frame': 31 obs. of 4 variables:
# $ y_temp: int 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_temp: int 5 5 5 5 5 5 5 5 5 5 ...
# $ d_temp: int 1 2 3 4 5 6 7 8 9 10 ...
# $ temp : num 16.9 18 17.4 19.7 105.7 ...
res<-merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)
head(res)
# y_release m_release d_release Freq temp
#1 2004 5 1 0 16.9
#2 2004 5 10 1 16.1
#3 2004 5 11 1 15.8
#4 2004 5 12 1 15.1
#5 2004 5 13 0 17.8
#6 2004 5 14 0 17.4
# changing the class
release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))
res1<- merge(release_freq, temp_h12,
by.x=c("y_release","m_release","d_release"),
by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)
head(res1)
# y_release m_release d_release Freq temp
#1 2004 5 1 0 16.9
#2 2004 5 10 1 16.1
#3 2004 5 11 1 15.8
#4 2004 5 12 1 15.1
#5 2004 5 13 0 17.8
#6 2004 5 14 0 17.4
The results are not identical.
identical(res,res1)
#[1] FALSE
str(res)
#'data.frame': 31 obs. of 5 variables:
# $ y_release: Factor w/ 1 level "2004": 1 1 1 1 1 1 1 1 1 1 ...
# $ m_release: Factor w/ 2 levels "5","6": 1 1 1 1 1 1 1 1 1 1 ...
# $ d_release: Factor w/ 31 levels "1","2","3","4",..: 1 10 11 12 13 14 15 16 17 18 ...
# $ Freq : num 0 1 1 1 0 0 1 1 0 1 ...
# $ temp : num 16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...
str(res1)
#'data.frame': 31 obs. of 5 variables:
# $ y_release: int 2004 2004 2004 2004 2004 2004 2004 2004 2004 2004 ...
# $ m_release: int 5 5 5 5 5 5 5 5 5 5 ...
# $ d_release: int 1 10 11 12 13 14 15 16 17 18 ...
# $ Freq : num 0 1 1 1 0 0 1 1 0 1 ...
# $ temp : num 16.9 16.1 15.8 15.1 17.8 17.4 16 17.7 17.3 22.3 ...
sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_CA.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_CA.UTF-8 LC_COLLATE=en_CA.UTF-8
[5] LC_MONETARY=en_CA.UTF-8 LC_MESSAGES=en_CA.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] stringr_0.6.2 reshape2_1.2.2
loaded via a namespace (and not attached):
[1] plyr_1.8
A.K.
----- Original Message -----
From: Rui Barradas <ruipbarradas at sapo.pt>
To: Stefano Sofia <stefano.sofia at regione.marche.it>
Cc: "r-help at r-project.org" <r-help at r-project.org>
Sent: Tuesday, July 23, 2013 6:50 AM
Subject: Re: [R] Some days missing using xtabs
Hello,
As for your second question, before merge(), try the following.
release_freq$d_release <- as.integer(as.character(release_freq$d_release))
release_freq$m_release <- as.integer(as.character(release_freq$m_release))
release_freq$y_release <- as.integer(as.character(release_freq$y_release))
And the warning is gone.
Hope this helps,
Rui Barradas
Em 23-07-2013 10:33, Stefano Sofia escreveu:
Dear R-users,
given the following data frame called hospital_2004
gender d_birth m_birth y_birth address d_admittance m_admittance y_admittance yard_admittance d_release m_release y_release yard_release diaprinc diasec1 diasec2 diasec3 diasec4 diasec5
2 13 12 1929 42002 30 3 2004 3003 6 5 2004 4902 430 4299 51881 4275 78001 0
1 1 8 1935 42002 7 4 2004 2401 18 5 2004 1801 20500 V581 0388 5849 0 0
1 23 12 1956 42018 26 4 2004 2402 31 5 2004 2402 1552 5715 7895 25000 4148 5722
1 9 8 1919 42002 05 5 2004 2602 22 5 2004 4902 51881 4254 4275 0 0 0
2 11 1 1925 52014 30 4 2004 2603 13 6 2004 4902 51881 49121 2732 4275 4299 5849
2 1 3 1963 44060 1 5 2004 5101 16 5 2004 2401 3201 1519 1976 1983 4019 0
1 6 3 1937 45010 6 5 2004 3003 12 5 2004 4901 431 3314 41189 25001 4019 V594
1 3 9 1931 42034 3 5 2004 5101 5 5 2004 5101 78559 4829 5119 1619 4241 585
2 13 9 1912 41007 5 5 2004 4901 7 5 2004 4901 85225 4019 42731 49121 0 0
1 21 10 1936 15146 7 5 2004 4901 10 5 2004 4901 431 430 V594 V595 0 0
2 8 5 1933 43044 8 5 2004 5802 8 6 2004 5802 5712 45620 2851 5119 5184 0
1 25 1 1926 41057 8 5 2004 4901 15 5 2004 4901 431 78001 49121 0 0 0
1 6 1 1923 42002 10 5 2004 1401 11 5 2004 4901 4440 412 4413 0 0 0
1 19 3 1934 42022 9 5 2004 1401 21 6 2004 4901 4413 5609 99811 4019 412 0
1 6 6 1921 43052 15 5 2004 4302 4 6 2004 4302 1890 20280 436 49121 9986 V1005
when I try to evaluate the frequency of daily releases through
release_freq <- as.data.frame(xtabs( ~ d_release + m_release + y_release, data=hospital_2004))
I get the following result:
d_release m_release y_release Freq
4 5 2004 0
5 5 2004 1
6 5 2004 1
7 5 2004 1
8 5 2004 0
10 5 2004 1
11 5 2004 1
12 5 2004 1
13 5 2004 0
15 5 2004 1
16 5 2004 1
18 5 2004 1
21 5 2004 0
22 5 2004 1
31 5 2004 1
4 6 2004 1
5 6 2004 0
6 6 2004 0
7 6 2004 0
8 6 2004 1
10 6 2004 0
11 6 2004 0
12 6 2004 0
13 6 2004 1
15 6 2004 0
16 6 2004 0
18 6 2004 0
21 6 2004 1
22 6 2004 0
31 6 2004 0
Why the 1st, 2nd, 3rd, 9th, 14th, 17th, 19th, 20th, from 23rd to 30th of both May and June are missing? (and there is the 31st of June?)
And a final question: why given another data frame called temp_h12
y_temp m_temp d_temp temp
2004 5 1 16.90
2004 5 2 18.00
2004 5 3 17.40
2004 5 4 19.70
2004 5 5 105.70
2004 5 6 17.30
2004 5 7 17.00
2004 5 8 16.20
2004 5 9 16.10
2004 5 10 16.10
2004 5 11 15.80
2004 5 12 15.10
2004 5 13 17.80
2004 5 14 17.40
2004 5 15 16.00
2004 5 16 17.70
2004 5 17 17.30
2004 5 18 22.30
2004 5 19 23.30
2004 5 20 24.30
2004 5 21 19.90
2004 5 22 15.70
2004 5 23 15.80
2004 5 24 17.10
2004 5 25 18.30
2004 5 26 21.00
2004 5 27 18.20
2004 5 28 17.90
2004 5 29 19.40
2004 5 30 22.10
2004 5 31 17.40
merge(release_freq, temp_h12, by.x=c("y_release","m_release","d_release"), by.y=c("y_temp","m_temp","d_temp"), stringsAsFactors=FALSE)
gives the following warning
Warning message:
In `[<-.factor`(`*tmp*`, ri, value = 1:31) :
invalid factor level, NAs generated
?
Thank you for your help
Stefano Sofia