Yep - you're right - missing parents are indicated as zero in the M/PID
field.
The above code worked with a few errors:
1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
number of items to replace is not a multiple of replacement length
2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
number of items to replace is not a multiple of replacement length
3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] :
number of items to replace is not a multiple of replacement length
4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] :
number of items to replace is not a multiple of replacement length
looking at the output I get numbers where the father/mother ID should
be in the M/PID field. For example:
2702 349 mother 0 0
2702 3456 sibling 0 842
2702 9980 sibling 0 842
3064 3 father 0 0
3064 4 mother 0 0
3064 5 sibling 879 880
3064 86 sibling 879 880
3064 87 sibling 879 880
On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com>
wrote:
Dear Kate,
Try this:
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
It is assumed that when either parent is not available the M/PID is 0.
Best,
Jorge.-
On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com
wrote:
Actually - I didn't check this before, but these are not all nuclear
families (as I assumed they were). That is, some don't have a father
or don't have a mother.... Usually if this is the case PID or MID will
become 0, respectively, for the child. How can the code be edit to
account for this?
On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com
Thanks!
I think I know what is being done here but not sure how to fix the
following error:
Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] :
replacement has length zero
On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez
<jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
<kate.ignatius at gmail.com>
wrote:
Hi,
I have a data.table question (as well as if else statement query).
I have a large list of families (file has 935 individuals that are
sorted by famiy of varying sizes). At the moment the file has the
columns:
SampleID FamilyID Relationship
To prevent from having to make a pedigree file by hand - ie adding a
PaternalID and a MaternalID one by one I want to try write a script
that will quickly do this for me (I eventually want to run this
through a program such as plink) Is there a way to use data.table
(maybe in conjucntion with ifelse to do this effectively)?
An example of the file is something like:
Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father
But the goal is to have a file like this:
Family.ID Sample.ID Relationship PID MID
14 62 sibling 94 59
14 94 father 0 0
14 63 sibling 94 59
14 59 mother 0 0
17 6004 father 0 0
17 6003 mother 0 0
17 6005 sibling 6004 6003
17 368 sibling 6004 6003
130 202 mother 0 0
130 203 father 0 0
130 204 sibling 203 202
130 205 sibling 203 202
130 206 sibling 203 202
222 9 mother 0 0
222 45 sibling 18 9
222 34 sibling 18 9
222 10 sibling 18 9
222 11 sibling 18 9
222 18 father 0 0
I've tried searches for this but with no luck. Greatly appreciate
help - even if its just a link to a great example/solution!
Thanks!