I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
Counting the occurences of a charater within a string
8 messages · Douglas Esneault, Florent D., Bert Gunter +2 more
## It's not a data frame -- it's just a vector.
x
[1] "abc/def" "ghi/jkl/mno"
gsub("[^/]","",x)
[1] "/" "//"
nchar(gsub("[^/]","",x))
[1] 1 2
?gsub ?nchar -- Bert On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault
<Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. ? GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
I used within and vapply:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) ==
"/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? ?Col1 Col21 ? ? abc/def ? ?12 ghi/jkl/mno ? ?2
On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter <gunter.berton at gene.com> wrote:
## It's not a data frame -- it's just a vector.
x
[1] "abc/def" ? ? "ghi/jkl/mno"
gsub("[^/]","",x)
[1] "/" ?"//"
nchar(gsub("[^/]","",x))
[1] 1 2
?gsub ?nchar -- Bert On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault <Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. ? GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Resending my code, not sure why the linebreaks got eaten:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) == "/")
within(x, Col2 <- vapply(Col1, count.slashes, 1))
Col1 Col2 1 abc/def 1 2 ghi/jkl/mno 2
On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
I used within and vapply:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) ==
"/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? ?Col1 Col21 ? ? abc/def ? ?12 ghi/jkl/mno ? ?2
On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter <gunter.berton at gene.com> wrote:
## It's not a data frame -- it's just a vector.
x
[1] "abc/def" ? ? "ghi/jkl/mno"
gsub("[^/]","",x)
[1] "/" ?"//"
nchar(gsub("[^/]","",x))
[1] 1 2
?gsub ?nchar -- Bert On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault <Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. ? GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
strsplit is certainly an alternative, but your approach is unnecessarily complicated and inefficient. Do this, instead: sapply(strsplit(x,"/"),length)-1 Cheers, Bert
On Thu, Dec 1, 2011 at 7:44 PM, Florent D. <flodel at gmail.com> wrote:
Resending my code, not sure why the linebreaks got eaten:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) == "/")
within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? Col1 Col2 1 ? ? abc/def ? ?1 2 ghi/jkl/mno ? ?2 On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
I used within and vapply:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) ==
"/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? ?Col1 Col21 ? ? abc/def ? ?12 ghi/jkl/mno ? ?2
On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter <gunter.berton at gene.com> wrote:
## It's not a data frame -- it's just a vector.
x
[1] "abc/def" ? ? "ghi/jkl/mno"
gsub("[^/]","",x)
[1] "/" ?"//"
nchar(gsub("[^/]","",x))
[1] 1 2
?gsub ?nchar -- Bert On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault <Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. ? GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Inefficient, maybe, but what you suggest does not work if a string starts or ends with a slash.
On Thu, Dec 1, 2011 at 11:11 PM, Bert Gunter <gunter.berton at gene.com> wrote:
strsplit is certainly an alternative, but your approach is unnecessarily complicated and inefficient. Do this, instead: sapply(strsplit(x,"/"),length)-1 Cheers, Bert On Thu, Dec 1, 2011 at 7:44 PM, Florent D. <flodel at gmail.com> wrote:
Resending my code, not sure why the linebreaks got eaten:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) == "/")
within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? Col1 Col2 1 ? ? abc/def ? ?1 2 ghi/jkl/mno ? ?2 On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
I used within and vapply:
x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"), stringsAsFactors = FALSE)
count.slashes <- function(string)sum(unlist(strsplit(string, NULL)) ==
"/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
? ? ? ? ?Col1 Col21 ? ? abc/def ? ?12 ghi/jkl/mno ? ?2
On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter <gunter.berton at gene.com> wrote:
## It's not a data frame -- it's just a vector.
x
[1] "abc/def" ? ? "ghi/jkl/mno"
gsub("[^/]","",x)
[1] "/" ?"//"
nchar(gsub("[^/]","",x))
[1] 1 2
?gsub ?nchar -- Bert On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault <Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row. Can anyone point me in the right direction or offer some code to do this? Thanks in advance for the help. Doug Esneault Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of the GroupM companies shall be understood as neither given nor endorsed by it. ? GroupM companies are a member of WPP plc. For more information on our business ethical standards and Corporate Responsibility policies please refer to our website at http://www.wpp.com/WPP/About/
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
On Dec 1, 2011, at 11:11 PM, Bert Gunter wrote:
strsplit is certainly an alternative, but your approach is unnecessarily complicated and inefficient. Do this, instead: sapply(strsplit(x,"/"),length)-1
Definitely more compact that the regex alternates I came up with, but
one of these still might appeal in situations where it was desireable
to have the source strings as labels:
> sapply( sapply(x$Col1, gregexpr, patt="/"), length)
abc/def ghi/jkl/mno
1 2
> nchar( sapply(x$Col1, gsub, patt="[^/]", rep="" ) )
abc/def ghi/jkl/mno
1 2
David
>
> Cheers,
> Bert
>
> On Thu, Dec 1, 2011 at 7:44 PM, Florent D. <flodel at gmail.com> wrote:
>> Resending my code, not sure why the linebreaks got eaten:
>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"),
>>> stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string,
>>> NULL)) == "/")
>>> within(x, Col2 <- vapply(Col1, count.slashes, 1))
>> Col1 Col2
>> 1 abc/def 1
>> 2 ghi/jkl/mno 2
>>
>>
>> On Thu, Dec 1, 2011 at 10:32 PM, Florent D. <flodel at gmail.com> wrote:
>>> I used within and vapply:
>>>
>>> x <- data.frame(Col1 = c("abc/def", "ghi/jkl/mno"),
>>> stringsAsFactors = FALSE)
>>> count.slashes <- function(string)sum(unlist(strsplit(string,
>>> NULL)) ==
>>> "/")within(x, Col2 <- vapply(Col1, count.slashes, 1))
>>> Col1 Col21 abc/def 12 ghi/jkl/mno 2
>>>
>>> On Thu, Dec 1, 2011 at 1:05 PM, Bert Gunter
>>> <gunter.berton at gene.com> wrote:
>>>> ## It's not a data frame -- it's just a vector.
>>>>
>>>>> x
>>>> [1] "abc/def" "ghi/jkl/mno"
>>>>> gsub("[^/]","",x)
>>>> [1] "/" "//"
>>>>> nchar(gsub("[^/]","",x))
>>>> [1] 1 2
>>>>>
>>>>
>>>> ?gsub
>>>> ?nchar
>>>>
>>>> -- Bert
>>>>
>>>> On Thu, Dec 1, 2011 at 8:32 AM, Douglas Esneault
>>>> <Douglas.Esneault at mecglobal.com> wrote:
>>>>> I am new to R but am experienced SAS user and I was hoping to
>>>>> get some help on counting the occurrences of a character within
>>>>> a string at a row level.
>>>>>
>>>>> My dataframe, x, is structured as below:
>>>>>
>>>>> Col1
>>>>> abc/def
>>>>> ghi/jkl/mno
>>>>>
>>>>> I found this code on the board but it counts all occurrences of
>>>>> "/" in the dataframe.
>>>>>
>>>>> chr.pos <- which(unlist(strsplit(x,NULL))=='/')
>>>>> chr.count <- length(chr.pos)
>>>>> chr.count
>>>>> [1] 3
>>>>>
>>>>> I'd like to append a column, say cnt, that has the count of "/"
>>>>> for each row.
>>>>>
>>>>> Can anyone point me in the right direction or offer some code to
>>>>> do this?
>>>>>
>>>>> Thanks in advance for the help.
>>>>>
>>>>> Doug Esneault
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Privileged/Confidential Information may be contained in this
>>>>> message. If you
>>>>> are not the addressee indicated in this message (or responsible
>>>>> for delivery
>>>>> of the message to such person), you may not copy or deliver this
>>>>> message to
>>>>> anyone. In such case, you should destroy this message and kindly
>>>>> notify the
>>>>> sender by reply email. Please advise immediately if you or your
>>>>> employer
>>>>> does not consent to email for messages of this kind. Opinions,
>>>>> conclusions
>>>>> and other information in this message that do not relate to the
>>>>> official
>>>>> business of the GroupM companies shall be understood as neither
>>>>> given nor
>>>>> endorsed by it. GroupM companies are a member of WPP plc. For
>>>>> more
>>>>> information on our business ethical standards and Corporate
>>>>> Responsibility
>>>>> policies please refer to our website at
>>>>> http://www.wpp.com/WPP/About/
>>>>>
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Bert Gunter
>>>> Genentech Nonclinical Biostatistics
>>>>
>>>> Internal Contact Info:
>>>> Phone: 467-7374
>>>> Website:
>>>> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
1 day later
On Thu, Dec 1, 2011 at 10:32 AM, Douglas Esneault
<Douglas.Esneault at mecglobal.com> wrote:
I am new to R but am experienced SAS user and I was hoping to get some help on counting the occurrences of a character within a string at a row level. My dataframe, x, ?is structured as below: Col1 abc/def ghi/jkl/mno I found this code on the board but it counts all occurrences of "/" in the dataframe. chr.pos <- which(unlist(strsplit(x,NULL))=='/') chr.count <- length(chr.pos) chr.count [1] 3 I'd like to append a column, say cnt, that has the count of "/" for each row.
Here's an easy way from stringr:
library(stringr)
str_count( c("abc/def", "ghi/jkl/mno"), "/")
# [1] 1 2
Hadley
Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/