Skip to content

how do I remove entries in data frame from a vector

6 messages · Ana Marija, Rui Barradas, Rolf Turner

#
Hello,

I have a data frame with one column:
V1

1 ABAFT_g_4RWG569_BI_SNP_A10_35096
2 ABAFT_g_4RWG569_BI_SNP_B12_35130
3 ABAFT_g_4RWG569_BI_SNP_E09_35088
4 ABAFT_g_4RWG569_BI_SNP_E12_35136
5 ABAFT_g_4RWG569_BI_SNP_F11_35122
6 ABAFT_g_4RWG569_BI_SNP_F12_35138
7 ABAFT_g_4RWG569_BI_SNP_G07_35060
8 ABAFT_g_4RWG569_BI_SNP_G12_35140

I want to remove these 8 entries from remove data frame from this
vector that looks like this:
[1] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A01_34952.CEL"
[2] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A02_34968.CEL"

[3] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A03_34984.CEL"

[4] "GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A04_35000.CEL"

[5] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A05_35016.CEL"

[6] "/GOKIND/75327/PhenoGenotypeFiles/RootStudyConsentSet_phs000018.GAIN_GoKinD.v2.p1.c1.DS-T1DCR-IRB/GenotypeFiles/ABAFT_g_4RWG569_BI_SNP_A06_35032.CEL"
...

I tried doing this:

b= celFiles[!basename(celFiles) %in% as.character(remove$V1)]

but none of the 8th entries in "remove" data frame have been removed.

Please advise,
Ana
#
Hello,

This is probably because basename keeps the file extension, try instead


filename <- sub("(^[^\\.]*)\\..+$", "\\1", basename(celFiles))
celFiles[!filename %in% as.character(remove$V1)]


Hope this helps,

Rui Barradas

?s 22:15 de 21/10/20, Ana Marija escreveu:
#
Hello,

To remove the file extension it's much easier to use base R


filename <- tools::file_path_sans_ext(basename(celFiles))


Hope this helps,

Rui Barradas

?s 22:41 de 21/10/20, Rui Barradas escreveu:
#
Thank you so much!
On Wed, Oct 21, 2020 at 4:47 PM Rui Barradas <ruipbarradas at sapo.pt> wrote:
#
On Wed, 21 Oct 2020 16:15:22 -0500
Ana Marija <sokovic.anamarija at gmail.com> wrote:

            
I would advise you to *look* at basename(celFiles)!!!

The entries end in ".CEL"; the names in remove$V1 do not.  So %in%
finds no matches.  Perhaps:

    b <- celFiles[!basename(celFiles) %in%
                 paste0(as.character(remove$V1),".CEL")]

Note that, for the data that you have presented, none of the entries of
celFiles "match up" with "remove" so it is *still* the case that (for
the data shown) none of the entries will be removed.  So your example
was bad.

cheers,

Rolf Turner
#
Makes sense, thank you!
On Wed, 21 Oct 2020 at 17:46, Rolf Turner <r.turner at auckland.ac.nz> wrote: