Back to formatted view
Raw Message

Message-ID: <3506E431-1A62-4AFA-BB3D-25FE68B7D3BF@me.com>
Date: 2011-10-28T14:59:50Z
From: Marc Schwartz
Subject: quick matching question
In-Reply-To: <CAKx9SQnor7g+hR0Ak8bS+K9YY_1eiv3eCnDjKvmOmsbDe00i+Q@mail.gmail.com>

On Oct 28, 2011, at 9:49 AM, Ben Ganzfried wrote:

> Hey,
> 
> I'm trying to match patient identifiers from two separate input files, and
> then add information from one of the input files to the corresponding output
> file.  I'd greatly appreciate any help!
> 
> More specifically,
> Input_File_1 has a column header "bcr_patient_barcode"
> Input_File_2 has a column header "Barcode" and a column header "Batch"
> 
> I want my script to match the appropriate patient identifiers since
> "bcr_patient_barcode" and "Barcode" are not in the same order.  Then I want
> to add the information from "Batch" to the corresponding patient.
> 
> My (incorrect) code is below:
> 
> #batch
> tmp <- Input_File_2$Barcode
> tmp1 <- Input_File_1$bcr_patient_barcode
> 
> for i in tmp
> for item in tmp1
> if (tmp == tmp1) {
>  curated$batch <- Input_File_2$Batch
> }
> 
> Thanks!


See ?merge and then use something like:

  newDF <- merge(Input_File_2, Input_File_1, by.x = "Barcode", by.y = "bcr_patient_barcode")

Also, pay attention to the 'all', 'all.x' and 'all.y' arguments, which control whether or not only matching records are retained or non-matching records are retained from one or both datasets. merge() performs an "SQL-like" join operation.

HTH,

Marc Schwartz