[Fwd: Re: [R] Randomly remove condition-selected rows from a matrix]
Following Duncan's suggestion, I forward the below to R-devel. vQ -------- Original Message -------- Subject: Re: [R] Randomly remove condition-selected rows from a matrix Date: Fri, 02 Jan 2009 10:34:52 -0500 From: Duncan Murdoch <murdoch at stats.uwo.ca> To: Wacek Kusnierczyk <Waclaw.Marcin.Kusnierczyk at idi.ntnu.no> CC: R help <R-help at stat.math.ethz.ch> References: <79CAFBDD-4BB8-4C9D-A0E9-54E280458510 at gmail.com> <8b356f880812300920o19d18aeo47dc31f087c3f36 at mail.gmail.com> <DA6ECC19-C786-4C02-B246-4B613726BC7F at gmail.com> <8b356f880812311042la28aef3t81ad09a3b14ce65 at mail.gmail.com> <495E2D95.9040502 at idi.ntnu.no>
On 02/01/2009 10:07 AM, Wacek Kusnierczyk wrote:
Stavros Macrakis wrote:
On Wed, Dec 31, 2008 at 12:44 PM, Guillaume Chapron <carnivorescience at gmail.com> wrote:
m[-sample(which(m[,1]<8 & m[,2]>12),2),]
Supposing I sample only one row among the ones matching my criteria. Then
consider the case where there is just one row matching this criteria. Sure,
there is no need to sample, but the instruction would still be executed.
Then if this row index is 15, my instruction becomes which(15,1), and this
can gives me any row from 1 to 15, which is not correct. I have to make a
condition in case there is only one row matching the criteria.
Yes, this is a (documented!) design flaw in 'sample' -- see the man page. For some reason, the designers of R have chosen to document the flaw and leave it up to individual users to work around it rather than fix it definitively. A related case is sample(c(),0), which gives an error rather than giving an empty vector, though in general R deals with empty vectors correctly (e.g. sum(c()) => 0).
interestingly, ?sample says:
"
'sample' takes a sample of the specified size from the elements of
'x' using either with or without replacement.
x: Either a (numeric, complex, character or logical) vector of
more than one element from which to choose, or a positive
integer.
If 'x' has length 1, is numeric (in the sense of 'is.numeric') and
'x >= 1', sampling takes place from '1:x'. _Note_ that this
convenience feature may lead to undesired behaviour when 'x' is of
varying length 'sample(x)'. See the 'resample()' example below.
"
yet the following works, even though x has length 1 and is *not* numeric:
x = "foolme"
is.numeric(x)
sample(x, 1)
sample(x)
x = NA
is.numeric(NA)
sample(x, 1)
sample(x)
is this a bug in the code, or a bug in the documentation?
To my mind, it is bizarre to have an important basic function which works for some argument lengths but not others. The convenience of being able to write sample(5,2) for sample(1:5,2) hardly seems worth inflicting inconsistency on all users -- but perhaps one of the designers of R/S can enlighten us on the design rationale here.
hopefully.
This is more of an R-devel sort of question. My guess is that this is in the S blue book, but I don't have a copy here to check. Duncan Murdoch