R-Help Forum I have a data set that contains a date field but the dates are in two formats 11/7/2016 dd/mm/yyyy 14-07-16 dd-mm-yy How would I go about correcting this problem. Should I separate the dates, format them , and then recombine? Sincerely Jeff Reichman (314) 457-1966
transforming dates
9 messages · reichm@@j m@iii@g oii sbcgiob@i@@et, Bert Gunter, Rui Barradas +3 more
Well, one way to do it is via regex's -- no splitting and recombining needed. Note: This will convert a factor into a character vector.
z <- c("11/7/2016", "14-07-16")
z <- gsub("-([[:digit:]]{2})-([[:digit:]]{2})", "/\\1/20\\2",z) ## /\ is
/ and \
z
[1] "11/7/2016" "14/07/2016" I leave it to you as an exercise to either convert 7 to 07 or vice-versa if you want to do this. Note, if you have spaces sprinkled inconsistently around your separators, you'll have to work a bit harder with your regex. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Nov 2, 2019 at 7:25 PM <reichmanj at sbcglobal.net> wrote:
R-Help Forum
I have a data set that contains a date field but the dates are in two
formats
11/7/2016 dd/mm/yyyy
14-07-16 dd-mm-yy
How would I go about correcting this problem. Should I separate the dates,
format them , and then recombine?
Sincerely
Jeff Reichman
(314) 457-1966
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello,
I believe the simplest is to use package lubridate. Its functions try
several formats until either one is right or none fits the data.
x <- c('11/7/2016', '14-07-16')
lubridate::dmy(x)
#[1] "2016-07-11" "2016-07-14"
The order dmy must be the same for all vector elements, if not
y <- c('11/7/2016', '14-07-16', '2016/7/11')
lubridate::dmy(y)
#[1] "2016-07-11" "2016-07-14" NA
#Warning message:
# 1 failed to parse.
Hope this helps,
Rui Barradas
?s 02:25 de 03/11/19, reichmanj at sbcglobal.net escreveu:
R-Help Forum I have a data set that contains a date field but the dates are in two formats 11/7/2016 dd/mm/yyyy 14-07-16 dd-mm-yy How would I go about correcting this problem. Should I separate the dates, format them , and then recombine? Sincerely Jeff Reichman (314) 457-1966 [[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Rui is right -- lubridate functionality and robustness is better -- but
just for fun, here is a simple function, poorly named reformat(), that
splits up the date formats, cleans them up and standardizes them a bit, and
spits them back out with a sep character of your choice (your original
split and recombine suggestion). Lubridate probably does something similar
but more sophisticated, but maybe it's worthwhile to see how one can do it
using basic functionality. This only requires a few short lines of code.
reformat <- function(z, sep = "-"){
z <- gsub(" ","",z) ## remove blanks
## break up dates into 3 component pieces and convert to matrix
z <- matrix(unlist(strsplit(z, "-|/")), nrow = 3)
## add "0" in front of single digit in dd and mm
## add "20" in front of "yy"
for(i in 1:2) z[i, ] <- gsub("\\<([[:digit:]])\\>","0\\1",z[i, ])
z[3, ] <- sub("\\<([[:digit:]]{2})\\>","20\\1",z[3, ])
## combine back into single string separated by sep
paste(z[1, ],z[2, ],z[3, ], sep = sep)
}
## Testit
z <- c(" 1 / 22 /2015"," 1 -5 -15","11/7/2016", "14-07-16")
reformat(z)
[1] "01-22-2015" "01-05-2015" "11-07-2016" "14-07-2016"
reformat(z,"/")
[1] "01/22/2015" "01/05/2015" "11/07/2016" "14/07/2016" Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sun, Nov 3, 2019 at 12:15 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
Hello,
I believe the simplest is to use package lubridate. Its functions try
several formats until either one is right or none fits the data.
x <- c('11/7/2016', '14-07-16')
lubridate::dmy(x)
#[1] "2016-07-11" "2016-07-14"
The order dmy must be the same for all vector elements, if not
y <- c('11/7/2016', '14-07-16', '2016/7/11')
lubridate::dmy(y)
#[1] "2016-07-11" "2016-07-14" NA
#Warning message:
# 1 failed to parse.
Hope this helps,
Rui Barradas
?s 02:25 de 03/11/19, reichmanj at sbcglobal.net escreveu:
R-Help Forum I have a data set that contains a date field but the dates are in two formats 11/7/2016 dd/mm/yyyy 14-07-16 dd-mm-yy How would I go about correcting this problem. Should I separate the
dates,
format them , and then recombine?
Sincerely
Jeff Reichman
(314) 457-1966
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 11/3/19 11:51 AM, Bert Gunter wrote:
Rui is right -- lubridate functionality and robustness is better -- but just for fun, here is a simple function, poorly named reformat(), that splits up the date formats, cleans them up and standardizes them a bit, and spits them back out with a sep character of your choice (your original split and recombine suggestion). Lubridate probably does something similar but more sophisticated, but maybe it's worthwhile to see how one can do it using basic functionality. This only requires a few short lines of code.
If one wants to investigate existing efforts at automatic date _and_ time reformatting, then do not forget Dirk's anytime package: https://cran.r-project.org/web/packages/anytime/index.html
David.
>
> reformat <- function(z, sep = "-"){
> z <- gsub(" ","",z) ## remove blanks
> ## break up dates into 3 component pieces and convert to matrix
> z <- matrix(unlist(strsplit(z, "-|/")), nrow = 3)
> ## add "0" in front of single digit in dd and mm
> ## add "20" in front of "yy"
> for(i in 1:2) z[i, ] <- gsub("\\<([[:digit:]])\\>","0\\1",z[i, ])
> z[3, ] <- sub("\\<([[:digit:]]{2})\\>","20\\1",z[3, ])
> ## combine back into single string separated by sep
> paste(z[1, ],z[2, ],z[3, ], sep = sep)
> }
>
> ## Testit
>> z <- c(" 1 / 22 /2015"," 1 -5 -15","11/7/2016", "14-07-16")
>> reformat(z)
> [1] "01-22-2015" "01-05-2015" "11-07-2016" "14-07-2016"
>
>> reformat(z,"/")
> [1] "01/22/2015" "01/05/2015" "11/07/2016" "14/07/2016"
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, Nov 3, 2019 at 12:15 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:
>
>> Hello,
>>
>> I believe the simplest is to use package lubridate. Its functions try
>> several formats until either one is right or none fits the data.
>>
>> x <- c('11/7/2016', '14-07-16')
>> lubridate::dmy(x)
>> #[1] "2016-07-11" "2016-07-14"
>>
>>
>> The order dmy must be the same for all vector elements, if not
>>
>> y <- c('11/7/2016', '14-07-16', '2016/7/11')
>> lubridate::dmy(y)
>> #[1] "2016-07-11" "2016-07-14" NA
>> #Warning message:
>> # 1 failed to parse.
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> ?s 02:25 de 03/11/19, reichmanj at sbcglobal.net escreveu:
>>> R-Help Forum
>>>
>>>
>>>
>>> I have a data set that contains a date field but the dates are in two
>>> formats
>>>
>>>
>>>
>>> 11/7/2016 dd/mm/yyyy
>>>
>>> 14-07-16 dd-mm-yy
>>>
>>>
>>>
>>> How would I go about correcting this problem. Should I separate the
>> dates,
>>> format them , and then recombine?
>>>
>>>
>>>
>>> Sincerely
>>>
>>>
>>>
>>> Jeff Reichman
>>>
>>> (314) 457-1966
>>>
>>>
>>>
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Yes, indeed. Thanks, David. Cheers, Bert On Sun, Nov 3, 2019 at 12:22 PM David Winsemius <dwinsemius at comcast.net> wrote:
On 11/3/19 11:51 AM, Bert Gunter wrote:
Rui is right -- lubridate functionality and robustness is better -- but just for fun, here is a simple function, poorly named reformat(), that splits up the date formats, cleans them up and standardizes them a bit,
and
spits them back out with a sep character of your choice (your original split and recombine suggestion). Lubridate probably does something
similar
but more sophisticated, but maybe it's worthwhile to see how one can do
it
using basic functionality. This only requires a few short lines of code.
If one wants to investigate existing efforts at automatic date _and_ time reformatting, then do not forget Dirk's anytime package: https://cran.r-project.org/web/packages/anytime/index.html -- David.
reformat <- function(z, sep = "-"){
z <- gsub(" ","",z) ## remove blanks
## break up dates into 3 component pieces and convert to matrix
z <- matrix(unlist(strsplit(z, "-|/")), nrow = 3)
## add "0" in front of single digit in dd and mm
## add "20" in front of "yy"
for(i in 1:2) z[i, ] <- gsub("\\<([[:digit:]])\\>","0\\1",z[i, ])
z[3, ] <- sub("\\<([[:digit:]]{2})\\>","20\\1",z[3, ])
## combine back into single string separated by sep
paste(z[1, ],z[2, ],z[3, ], sep = sep)
}
## Testit
z <- c(" 1 / 22 /2015"," 1 -5 -15","11/7/2016", "14-07-16")
reformat(z)
[1] "01-22-2015" "01-05-2015" "11-07-2016" "14-07-2016"
reformat(z,"/")
[1] "01/22/2015" "01/05/2015" "11/07/2016" "14/07/2016" Bert Gunter "The trouble with having an open mind is that people keep coming along
and
sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Nov 3, 2019 at 12:15 AM Rui Barradas <ruipbarradas at sapo.pt>
wrote:
Hello,
I believe the simplest is to use package lubridate. Its functions try
several formats until either one is right or none fits the data.
x <- c('11/7/2016', '14-07-16')
lubridate::dmy(x)
#[1] "2016-07-11" "2016-07-14"
The order dmy must be the same for all vector elements, if not
y <- c('11/7/2016', '14-07-16', '2016/7/11')
lubridate::dmy(y)
#[1] "2016-07-11" "2016-07-14" NA
#Warning message:
# 1 failed to parse.
Hope this helps,
Rui Barradas
?s 02:25 de 03/11/19, reichmanj at sbcglobal.net escreveu:
R-Help Forum I have a data set that contains a date field but the dates are in two formats 11/7/2016 dd/mm/yyyy 14-07-16 dd-mm-yy How would I go about correcting this problem. Should I separate the
dates,
format them , and then recombine?
Sincerely
Jeff Reichman
(314) 457-1966
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On 3 Nov 2019, at 21:22 , David Winsemius <dwinsemius at comcast.net> wrote: On 11/3/19 11:51 AM, Bert Gunter wrote:
======= Hey, that's my birthday! Err, no it isn't... ;-)
Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
On 2019-11-03 17:04, Peter Dalgaard wrote:
On 3 Nov 2019, at 21:22 , David Winsemius <dwinsemius at comcast.net> wrote: On 11/3/19 11:51 AM, Bert Gunter wrote:
======= Hey, that's my birthday! Err, no it isn't... ;-)
????? Is that November 11 of 2019 or March 19 of 2011 or 11 March 2019? ????? The English still use stones as a unit of mass, and most of the US still steadfastly refuses to seriously consider metrication or? ISO 8601.? I know an architect in the US, who has worked on several different projects every year for the past 40 years only one of which has been in metric units. ?????? Binary, octal or hex is superior to decimal, except for the fact that most humans have 10 digits on hands and feet.? And decimal is vastly superior to arithmetic in mixed bases, e.g., adding miles, rods, yards, feet, inches, and 64ths. ????? Spencer Graves
On 2019-11-03 17:04, Peter Dalgaard wrote:
On 3 Nov 2019, at 21:22 , David Winsemius <dwinsemius at comcast.net> wrote: On 11/3/19 11:51 AM, Bert Gunter wrote:
======= Hey, that's my birthday! Err, no it isn't... ;-)
????? Is that November 3 of 2019 or March 19 of 2011 or 11 March 2019?? [please excuse the typo in the earlier response] ????? The English still use stones as a unit of mass, and most of the US still steadfastly refuses to seriously consider metrication or? ISO 8601.? I know an architect in the US, who has worked on several different projects every year for the past 40 years only one of which has been in metric units. ?????? Binary, octal or hex is superior to decimal, except for the fact that most humans have 10 digits on hands and feet.? And decimal is vastly superior to arithmetic in mixed bases, e.g., adding miles, rods, yards, feet, inches, and 64ths. ????? Spencer Graves