Skip to content
Back to formatted view

Raw Message

Message-ID: <4855743d-012d-c0d3-6d39-6f0d508aefd3@fredhutch.org>
Date: 2019-12-19T06:48:31Z
From: Hervé Pagès
Subject: A weird behaviour of strsplit?
In-Reply-To: <ed093117-5306-1a75-1c24-9a8fc28295ac@gmail.com>

The fact that strsplit() doesn't say anything about 'split' being longer 
than 'x' adds to the confusion:

   > strsplit(c("xAy", "xxByB", "xCyCCz"), split=c("A", "B", "C", "D"))
   [[1]]
   [1] "x" "y"

   [[2]]
   [1] "xx" "y"

   [[3]]
   [1] "x" "y" ""  "z"

A warning (or error) would go a long way in helping the user realize 
they're doing something wrong.

No warning either when 'split' is shorter than 'x' but the length of the 
latter is not a multiple of the length of the former:

   > strsplit(c("xAy", "xxByB", "xCyCCz"), split=c("A", "B"))
   [[1]]
   [1] "x" "y"

   [[2]]
   [1] "xx" "y"

   [[3]]
   [1] "xCyCCz"

Which is also unexpected given that most binary operations do issue a 
warning in this case (e.g. 11:13 * 1:2).

H.


On 12/18/19 06:48, Duncan Murdoch wrote:
> On 18/12/2019 9:42 a.m., IAGO GIN? V?ZQUEZ wrote:
>> Hi all,
>>
>> In the help of strsplit one can read
>>
>> split?? character vector (or object which can be coerced to such) 
>> containing regular 
>> expression<https://urldefense.proofpoint.com/v2/url?u=http-3A__127.0.0.1-3A39783_help_library_base_help_regular-2520expression&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8oX1lQmqWY3lK0RSHzCrjkg95jmR7nr4Q0GU3Nw13qA&s=Tfpsttj1v1lIOp9QlfoqGJ1UsKCFOndwgmaNd6XT64s&e= 
>> >(s) (unless fixed = TRUE) to use for splitting. If empty matches 
>> occur, in particular if split has length 0, x is split into single 
>> characters. Ifsplit has length greater than 1, it is re-cycled along x.
>>
>> Taking into account that split is said to be a vector (not a length 1 
>> vector) and the last claim above, I would expect that the output of
>>
>>
>> strsplit("3:4", split = c(",",":"), fixed = TRUE)
>>
>> was the same than the output of
>>
>> strsplit("3:4", split = c(":"), fixed = TRUE)
>>
>> that is, splitting by "," (without effect in this example) and also by 
>> ":"
>>
>> [[1]]
>> [1] "3" "4"
>>
>> But, instead, I get
>> [[1]]
>> [1] "3:4"
>>
>> Am I wrongly understanding the help? Is it an expected output?
>> I tried with R 3.6.1 for Windows (10).
> 
> Yes, you are misunderstanding the help.? Your input x has length 1, so 
> only the first element of split will be used.? If you wanted to use 
> both, you would need a longer x.? For example,
> 
>  > strsplit(c("1:2", "3:4"), split=c(",", ":"), fixed=TRUE)
> [[1]]
> [1] "1:2"
> 
> [[2]]
> [1] "3" "4"
> 
> The first element is split using "," -- since there are none, there's no 
> splitting done.? The second element is split using ":".
> 
> Duncan Murdoch
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Ddevel&d=DwIDaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=BK7q3XeAvimeWdGbWY_wJYbW0WYiZvSXAJJKaaPhzWA&m=8oX1lQmqWY3lK0RSHzCrjkg95jmR7nr4Q0GU3Nw13qA&s=9m5muon8TUVCJdnvZtnyuxUQ88pc7qHCUsC6JGDF1qM&e= 
> 

-- 
Herv? Pag?s

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fredhutch.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319