Back to formatted view
Raw Message

Message-ID: <1375208143.22952.YahooMailNeo@web142603.mail.bf1.yahoo.com>
Date: 2013-07-30T18:15:43Z
From: arun
Subject: 'format' behaviour in a 'apply' call depending on 'options(digits = K)'
In-Reply-To: <51F800E2.5040805@ase-research.org>

Hi,
Try using trim=TRUE, in ?format()
options(digits=4)

df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
?df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], trim=TRUE,scientific = FALSE)) 
? df2$id2[99990:100010] 
# [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"? "99997" 
# [9] "99998"? "99999"? "100000" "100001" "100002" "100003" "100004" "100005"
#[17] "100006" "100007" "100008" "100009" "100010"


id2 <- format(1:110000, scientific = FALSE,trim=TRUE) 
id2[99990:100010]
# [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"? "99997" 
?#[9] "99998"? "99999"? "100000" "100001" "100002" "100003" "100004" "100005"
#[17] "100006" "100007" "100008" "100009" "100010"
A.K.


----- Original Message -----
From: Mathieu Basille <basille.web at ase-research.org>
To: David Winsemius <dwinsemius at comcast.net>
Cc: r-help at r-project.org
Sent: Tuesday, July 30, 2013 2:07 PM
Subject: Re: [R] 'format' behaviour in a 'apply' call depending on 'options(digits = K)'

Thanks David for your interest. I have to admit that your answer puzzles me 
even more than before. It seems that the underlying problem is way beyond 
my R skills...

The generation of id2 is indeed quite demanding, especially compared to a 
simple 'as.character' call. Anyway, since it seems to be system specific, 
here is the sessionInfo() that I forgot to attach to my first message:

R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
? [1] LC_CTYPE=fr_FR.UTF-8? ? ?  LC_NUMERIC=C
? [3] LC_TIME=fr_FR.UTF-8? ? ? ? LC_COLLATE=fr_FR.UTF-8
? [5] LC_MONETARY=fr_FR.UTF-8? ? LC_MESSAGES=fr_FR.UTF-8
? [7] LC_PAPER=C? ? ? ? ? ? ? ?  LC_NAME=C
? [9] LC_ADDRESS=C? ? ? ? ? ? ?  LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats? ?  graphics? grDevices utils? ?  datasets? methods?  base

In brief: last stable R available under Debian Testing... Hopefully this 
can help tracking down the problem.
Mathieu.


Le 07/30/2013 01:58 PM, David Winsemius a ?crit :
>
> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>
>> Dear list,
>>
>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>
>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>
>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>
>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>
>> Let's have a look at part of the result:
>>
>> df1$id2[99990:100010]
>> [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"
>> [8] "99997"? "99998"? "99999"? "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>
>> df1$id2[99990:100010]
>? [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"? "99997"
>? [9] "99998"? "99999"? "100000" "100001" "100002" "100003" "100004" "100005"
> [17] "100006" "100007" "100008" "100009" "100010"
>
> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>
> -- David.
>>
>> So far, so good. Let's now play with the 'digits' option:
>>
>> options(digits = 4)
>> df2 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>> df2$id2 <- apply(df2, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>> df2$id2[99990:100010]
>> [1] "99990"? "99991"? "99992"? "99993"? "99994"? " 99995" " 99996"
>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>
>> Notice the extra leading space from 99995 to 99999? To make sure it only happened there:
>>
>> df2$id2[which(df1$id2 != df2$id2)]
>> [1] " 99995" " 99996" " 99997" " 99998" " 99999"
>>
>> And just to make sure it only occurs in a 'apply' call, here is the same directly on a numeric vector:
>>
>> id2 <- format(1:110000, scientific = FALSE)
>> id2[99990:100010]
>> [1] " 99990" " 99991" " 99992" " 99993" " 99994" " 99995" " 99996"
>> [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>>
>> Here the leading spaces are for every number, which makes sense to me. Is there anything I'm misinterpreting in the behaviour of 'format'?
>> Thanks in advance for any hint,
>> Mathieu.
>>
>>
>> PS: Some background for this question. It all comes from a Rmd document, that knitr consistently failed to process, while the R code was fine using batch or interactive R. knitr uses 'options(digits = 4)' as opposed to 'options(digits = 7)' by default in R, which made one of my function throw an error with knitr, but not with batch or interactive R. I managed to solve the problem using 'trim = TRUE' in 'format', but I still do not understand what's going on...
>> If you're interested, see here for more details on the original problem: http://stackoverflow.com/questions/17866230/knitr-vs-interactive-r-behaviour/17872176
>>
>>
>> --
>>
>> ~$ whoami
>> Mathieu Basille, PhD
>>
>> ~$ locate --details
>> University of Florida \\
>> Fort Lauderdale Research and Education Center
>> (+1) 954-577-6314
>> http://ase-research.org/basille
>>
>> ~$ fortune
>> ? Le tout est de tout dire, et je manque de mots
>> Et je manque de temps, et je manque d'audace. ?
>> -- Paul ?luard
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>



>
> On Jul 30, 2013, at 9:01 AM, Mathieu Basille wrote:
>
>> Dear list,
>>
>> Here is a simple example in which the behaviour of 'format' does not make sense to me. I have read the documentation and searched the archives, but nothing pointed me in the right direction to understand this behaviour. Let's start with a simple data frame:
>>
>> df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
>>
>> Let's now create a new variable 'id2' which is the character representation of 'id'. Note that I use 'scientific = FALSE' to ensure that long numbers such as 100,000 are not formatted using their scientific representation (in this case 1e+05):
>>
>> df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
>>
>> Let's have a look at part of the result:
>>
>> df1$id2[99990:100010]
>> [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"
>> [8] "99997"? "99998"? "99999"? "100000" "100001" "100002" "100003"
>> [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
>
> Some formating processes are carried out by system functions. In this case I am unable to reproduce with the same code on a Mac OS 10.7.5/R 3.0.1 Patched
>
>> df1$id2[99990:100010]
>?  [1] "99990"? "99991"? "99992"? "99993"? "99994"? "99995"? "99996"? "99997"
>?  [9] "99998"? "99999"? "100000" "100001" "100002" "100003" "100004" "100005"
> [17] "100006" "100007" "100008" "100009" "100010"
>
> (I did notice that generation of the id2 variable seemed to take an inordinately long time.)
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.