I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Manel Amado i Mart?
Cap d'Assessoria de Comer? Interior
amado at cambrasabadell.org<mailto:amado at cambrasabadell.org>
Tel. 93 745 12 63 ? Fax 93 745 12 64 [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/facebook.png] <https://www.facebook.com/cambrasabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/Twitter.png] <https://twitter.com/CambraSabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/LinkedIn.png] <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name>
Av. Francesc Maci?, 35 ? 08206 Sabadell
Apt. corr. 119 ? www.cambrasabadell.org<http://www.cambrasabadell.org>
[http://www.cambrasabadell.org/Ficheros/mails/Plantilles/peu.png]
Aquest missatge pot contenir informaci? confidencial o sotmesa a secret professional, la divulgaci? de la qual est? prohibida per la llei. Si no sou el destinatari del missatge, si us plau, esborreu-lo i comuniqueu-nos-ho immediatament, no el reenvieu ni en copieu el contingut. Si la vostra empresa no permet rebre missatges d'aquesta mena, si us plau, feu-nos-ho saber immediatament.
Este mensaje puede contener informaci?n confidencial o sometida a secreto profesional, cuya divulgaci?n est? prohibida por la ley. Si no es usted el destinatario del mensaje, le rogamos que lo borre y nos lo notifique inmediatamente; no lo reenv?e ni copie su contenido. Si su empresa no permite la recepci?n de mensajes de este tipo, por favor h?ganoslo saber inmediatamente.
This message may contain confidential information that i...{{dropped:11}}
Optimizing loop
6 messages · Manel Amado Martí, Ulises M. Alvarez, Albyn Jones +1 more
On 02/05/2015 02:08 AM, Manel Amado Mart? wrote:
I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Hi: You may take a look at the dplyr library: https://github.com/hadley/dplyr If you provide a small, reproducible example, we may provide further help.
Ulises M. Alvarez http://sophie.unam.mx/
Actually, this question should be re-directed to the R-Help list - it is not about teaching with R. albyn On Thu, Feb 5, 2015 at 9:09 AM, Ulises M. Alvarez <uma at sophie.unam.mx> wrote:
On 02/05/2015 02:08 AM, Manel Amado Mart? wrote:
I'm processing a table database. To do that, I put it in a dataframe, and
then I do the data processing (normalization of some fields). I'm used to
program in C, and some R's facilities are not so natural to me, please,
excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on
the previous content. For example, if field starts with a digit instead of
an alpha character, the entire field from the actual row, I'll replace it
with "SOLPD". I'm sure that would be another way (maybe through some apply
function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Hi: You may take a look at the dplyr library: https://github.com/hadley/dplyr If you provide a small, reproducible example, we may provide further help. -- Ulises M. Alvarez http://sophie.unam.mx/
_______________________________________________ R-sig-teaching at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Ok, thanks to your help, and suggestions. Manel Amado i Mart? Cap d'Assessoria de Comer? Interior amado at cambrasabadell.org<mailto:amado at cambrasabadell.org> Tel. 93 745 12 63 ? Fax 93 745 12 64 [Segueix-nos a Facebook]<https://www.facebook.com/cambrasabadell> [Segueix-nos a Twitter] <https://twitter.com/CambraSabadell> [Segueix-nos a LinkedIn] <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name> Av. Francesc Maci?, 35 ? 08206 Sabadell Apt. corr. 119 ? www.cambrasabadell.org<http://www.cambrasabadell.org> De: Albyn Jones [mailto:jones at reed.edu] Enviat: dijous, 5 / febrer / 2015 20:46 Per a: Ulises M. Alvarez A/c: Manel Amado Mart?; r-sig-teaching at r-project.org Tema: Re: [R-sig-teaching] Optimizing loop Actually, this question should be re-directed to the R-Help list - it is not about teaching with R. albyn
On Thu, Feb 5, 2015 at 9:09 AM, Ulises M. Alvarez <uma at sophie.unam.mx<mailto:uma at sophie.unam.mx>> wrote:
On 02/05/2015 02:08 AM, Manel Amado Mart? wrote:
I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na<http://is.na>(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d<file:///\\d>")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Hi:
You may take a look at the dplyr library:
https://github.com/hadley/dplyr
If you provide a small, reproducible example, we may provide further help.
--
Ulises M. Alvarez
http://sophie.unam.mx/
_______________________________________________
R-sig-teaching at r-project.org<mailto:R-sig-teaching at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-teaching
Manel,
# I recommend the stringi package, which has deliberately copied much
# of the syntax used by Hadley's wonderful stringr package.
# However, it does more and is much faster.
# Since you are not requiring anything complex you can use base R functionality
# for everything, but I think the stringi syntax is cleaner and it is all
# C++ code under the covers.
library(stringi)
seed(1)
len <- 10
# Let's make a dataframe with some data.
# The column name of 'NIF/NIE' is problematic.
# Although you can force it, R is not going to like it.
# Use an underscore or period.
df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'),
len, replace = TRUE),
col2 = 1:len, stringsAsFactors = FALSE)
df_2 # see what it looks like
df_2$NIF_NIE[!is.na(df_2$NIF_NIE) &
stri_detect_regex(df_2$NIF_NIE, '^[0-9]')] <- 'SLOPD'
df_2
# end of code
On my iMac with len set to 10^6, this takes less than a tenth of a second.
start.time <- Sys.time()
len <- 1000000
df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'), len, replace = TRUE),
+ col2 = 1:len, stringsAsFactors = FALSE)
end.time <- Sys.time() time.taken <- end.time - start.time time.taken
Time difference of 0.08968401 secs R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msharp at TxBiomed.org
On Feb 5, 2015, at 2:08 AM, Manel Amado Mart? <amado at cambrasabadell.org> wrote:
I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Manel Amado i Mart?
Cap d'Assessoria de Comer? Interior
amado at cambrasabadell.org<mailto:amado at cambrasabadell.org>
Tel. 93 745 12 63 ? Fax 93 745 12 64 [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/facebook.png] <https://www.facebook.com/cambrasabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/Twitter.png] <https://twitter.com/CambraSabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/LinkedIn.png] <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name>
Av. Francesc Maci?, 35 ? 08206 Sabadell
Apt. corr. 119 ? www.cambrasabadell.org<http://www.cambrasabadell.org>
[http://www.cambrasabadell.org/Ficheros/mails/Plantilles/peu.png]
Aquest missatge pot contenir informaci? confidencial o sotmesa a secret professional, la divulgaci? de la qual est? prohibida per la llei. Si no sou el destinatari del missatge, si us plau, esborreu-lo i comuniqueu-nos-ho immediatament, no el reenvieu ni en copieu el contingut. Si la vostra empresa no permet rebre missatges d'aquesta mena, si us plau, feu-nos-ho saber immediatament.
Este mensaje puede contener informaci?n confidencial o sometida a secreto profesional, cuya divulgaci?n est? prohibida por la ley. Si no es usted el destinatario del mensaje, le rogamos que lo borre y nos lo notifique inmediatamente; no lo reenv?e ni copie su contenido. Si su empresa no permite la recepci?n de mensajes de este tipo, por favor h?ganoslo saber inmediatamente.
This message may contain confidential information that i...{{dropped:11}}
<ATT00001.c>
NOTICE: This E-Mail (including attachments) is confidential and may be legally privileged. It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. Please reply to the sender that you have received this message in error, then delete it.
Manel,
# I recommend the stringi package, which has deliberately copied much
# of the syntax used by Hadley's wonderful stringr package.
# However, it does more and is much faster.
# Since you are not requiring anything complex you can use base R functionality
# for everything, but I think the stringi syntax is cleaner and it is all
# C++ code under the covers.
library(stringi)
seed(1)
len <- 10
# Let's make a dataframe with some data.
# The column name of 'NIF/NIE' is problematic.
# Although you can force it, R is not going to like it.
# Use an underscore or period.
df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'),
len, replace = TRUE),
col2 = 1:len, stringsAsFactors = FALSE)
df_2 # see what it looks like
df_2$NIF_NIE[!is.na(df_2$NIF_NIE) &
stri_detect_regex(df_2$NIF_NIE, '^[0-9]')] <- 'SLOPD'
df_2
# end of code
On my iMac with len set to 10^6, this takes less than a tenth of a second.
start.time <- Sys.time()
len <- 1000000
df_2 <- data.frame('NIF_NIE' = sample(c(NA, 'starts with alpha', '1234numerals'), len, replace = TRUE),
+ col2 = 1:len, stringsAsFactors = FALSE)
end.time <- Sys.time() time.taken <- end.time - start.time time.taken
Time difference of 0.08968401 secs R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msharp at TxBiomed.org
On Feb 5, 2015, at 2:08 AM, Manel Amado Mart? <amado at cambrasabadell.org> wrote:
I'm processing a table database. To do that, I put it in a dataframe, and then I do the data processing (normalization of some fields). I'm used to program in C, and some R's facilities are not so natural to me, please, excuse me if the question is for "dummies".
In the processing, I want to substitute some field's value depending on the previous content. For example, if field starts with a digit instead of an alpha character, the entire field from the actual row, I'll replace it with "SOLPD". I'm sure that would be another way (maybe through some apply function), but I can't figure how to do.
The code that I'm using now, is:
for( i in 1:nrow(dataframe2)) {
if(is.na(dataframe2[i,"NIF/NIE"])==FALSE){
if(str_locate(dataframe2[i,"NIF/NIE"],"\\d")[1]<2){
sprintf("elimina NIF aut?nom: % i\n",i)
dataframe2[i,"NIF"]<-"SLOPD"}
}
}
}
Thank you for your attention!
Manel Amado i Mart?
Cap d'Assessoria de Comer? Interior
amado at cambrasabadell.org<mailto:amado at cambrasabadell.org>
Tel. 93 745 12 63 ? Fax 93 745 12 64 [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/facebook.png] <https://www.facebook.com/cambrasabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/Twitter.png] <https://twitter.com/CambraSabadell> [http://www.cambrasabadell.org/Ficheros/mails/Plantilles/LinkedIn.png] <http://www.linkedin.com/company/cambra-de-comer-de-sabadell?trk=company_name>
Av. Francesc Maci?, 35 ? 08206 Sabadell
Apt. corr. 119 ? www.cambrasabadell.org<http://www.cambrasabadell.org>
[http://www.cambrasabadell.org/Ficheros/mails/Plantilles/peu.png]
Aquest missatge pot contenir informaci? confidencial o sotmesa a secret professional, la divulgaci? de la qual est? prohibida per la llei. Si no sou el destinatari del missatge, si us plau, esborreu-lo i comuniqueu-nos-ho immediatament, no el reenvieu ni en copieu el contingut. Si la vostra empresa no permet rebre missatges d'aquesta mena, si us plau, feu-nos-ho saber immediatament.
Este mensaje puede contener informaci?n confidencial o sometida a secreto profesional, cuya divulgaci?n est? prohibida por la ley. Si no es usted el destinatario del mensaje, le rogamos que lo borre y nos lo notifique inmediatamente; no lo reenv?e ni copie su contenido. Si su empresa no permite la recepci?n de mensajes de este tipo, por favor h?ganoslo saber inmediatamente.
This message may contain confidential information that i...{{dropped:11}}
<ATT00001.c>
R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msharp at TxBiomed.org NOTICE: This E-Mail (including attachments) is confidential and may be legally privileged. It is covered by the Electronic Communications Privacy Act, 18 U.S.C.2510-2521. If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution or copying of this communication is strictly prohibited. Please reply to the sender that you have received this message in error, then delete it.