Skip to content

conditional filling of data.frame - improve code

9 messages · Ivan Calandra, Andrew Simmons, Jeff Newmiller +2 more

#
Dear useRs,

I would like to improve my ugly (though working) code, but I think I 
need a completely different approach and I just can't think out of my box!

I have some external information about which sample(s) belong to which 
experiment. I need to get that manually into R (either typing directly 
in a script or read a CSV file, but that makes no difference):
exp <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", 
"sample2-2" , "sample2-3"))

Then I have my data, only with the sample IDs:
mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", 
"sample1-1", "sample1-1", "sample2-1"))

Now I want to add a column to mydata with the experiment ID. The best I 
could find is that:
for (i in names(exp)) mydata[mydata[["sample"]] %in% exp[[i]], 
"experiment"] <- i

In this example, the experiment ID could be extracted from the sample 
IDs, but this is not the case with my real data so it really is a matter 
of matching. Of course I also have other columns with my real data.

I'm pretty sure the last line (with the loop) can be improved in terms 
of readability (speed is not an issue here). I have close to no 
constraints on 'exp' (here I chose a list, but anything could do), the 
only thing that cannot change is the format of 'mydata'.

Thank you in advance!
Ivan
#
I think what you're looking for is match.
It returns the indexes of the output where the inputs can be matched, and
has a nomatch argument incase no match is found, usually people would use
NA or 0 for nomatch.
On Thu, Mar 10, 2022, 10:51 Ivan Calandra <ivan.calandra at rgzm.de> wrote:

            

  
  
#
Use merge.

expts <- read.csv( text =
"expt,sample
ex1,sample1-1
ex1,sample1-2
ex2,sample2-1
ex2,sample2-2
ex2,sample2-3
", header=TRUE, as.is=TRUE )

mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", "sample1-1", "sample1-1", "sample2-1"))

merge( mydata, expts, by="sample", all.x=TRUE )
On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra at rgzm.de> wrote:

  
    
#
You could try some of the "join" commands from dplyr.
https://dplyr.tidyverse.org/reference/mutate-joins.html
https://statisticsglobe.com/r-dplyr-join-inner-left-right-full-semi-anti


Regards,
Tim
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller
Sent: Thursday, March 10, 2022 11:25 AM
To: r-help at r-project.org; Ivan Calandra <ivan.calandra at rgzm.de>; R-help <r-help at r-project.org>
Subject: Re: [R] conditional filling of data.frame - improve code

[External Email]

Use merge.

expts <- read.csv( text =
"expt,sample
ex1,sample1-1
ex1,sample1-2
ex2,sample2-1
ex2,sample2-2
ex2,sample2-3
", header=TRUE, as.is=TRUE )

mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", "sample1-1", "sample1-1", "sample2-1"))

merge( mydata, expts, by="sample", all.x=TRUE )
On March 10, 2022 7:50:23 AM PST, Ivan Calandra <ivan.calandra at rgzm.de> wrote:
--
Sent from my phone. Please excuse my brevity.

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=4HazMU4Mqs2oOcAkBrZd0VGrHX_lw6J1XozQNQ9RsHk&e=
PLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=Jzc7veojt_O3lQLFgC3O7ArDl8buUJGuuOHJZMWZJ9wTuTTwl_piuFOAv-w0ckT5&s=LdQqnVBkEAmRk7baBZLPs2svUpN6DIYaznrka_X8maI&e=
and provide commented, minimal, self-contained, reproducible code.
#
Thank you Jeff and Tim for your ideas. Indeed merge/join is probably the 
nicest way. Still, the code becomes much longer because I need more 
formatting of the input and output objects than with my ugly for loop :)

Cheers,
Ivan

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Le 10/03/2022 ? 18:58, Ebert,Timothy Aaron a ?crit?:
#
What a strange objection. You wouldn't keep the inline definition of expts in working code... that would be in a reference data file, and the merge is one line.
On March 10, 2022 11:24:27 PM PST, Ivan Calandra <ivan.calandra at rgzm.de> wrote:

  
    
#
In my first trials, I made a typo, which resulted in more columns than 
needed in the output of merge, which is why I needed more formatting. 
But now, it is indeed done all in one line and it is, as I said already, 
nicer anyway!

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Le 11/03/2022 ? 08:47, Jeff Newmiller a ?crit?:
#
Heello,

I hadn't posted an answer because my mapply is more complicated that the 
original and much more complicated than Jeff's merge but here it is. But 
if there's a problem with the output of merge, maybe the mapply can be 
of use, only the column expressly named is created.
The result is equal to the original.
I have changed the name exp to exp1.

mydata <- data.frame(sample = c("sample2-2", "sample2-3", "sample1-1", 
"sample1-1", "sample1-1", "sample2-1"))
exp1 <- list(ex1 = c("sample1-1", "sample1-2"), ex2 = c("sample2-1", 
"sample2-2" , "sample2-3"))

for(i in names(exp1)) {
   mydata[mydata[["sample"]] %in% exp1[[i]], "experiment"] <- i
}

# must create the new column beforehand
mydata[["experiment2"]] <- NA_character_
mapply(\(value, name, s){
   i <- which(s %in% value)
   mydata[["experiment2"]][i] <<- name
}, exp1, names(exp1), MoreArgs = list(s = mydata$sample))

mydata
#     sample experiment experiment2
#1 sample2-2        ex2         ex2
#2 sample2-3        ex2         ex2
#3 sample1-1        ex1         ex1
#4 sample1-1        ex1         ex1
#5 sample1-1        ex1         ex1
#6 sample2-1        ex2         ex2


Hope this helps,

Rui Barradas

?s 08:48 de 11/03/2022, Ivan Calandra escreveu:
#
Thank you Rui for your input.
I thought about mapply() too, but I'm not confident with it, I usually 
prefer loops (more intuitive to me).

It's good to have the choice :)

Ivan

--
Dr. Ivan Calandra
Imaging lab
RGZM - MONREPOS Archaeological Research Centre
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

Le 11/03/2022 ? 10:14, Rui Barradas a ?crit?: