Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
data.table/ifelse conditional new variable question
10 messages · Kate Ignatius, Jorge I Velez, John McKown +1 more
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero
On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother.... Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this?
On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Dear Kate,
Try this:
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
It is assumed that when either parent is not available the M/PID is 0.
Best,
Jorge.-
On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother.... Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com>
wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius <kate.ignatius at gmail.com
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Yep - you're right - missing parents are indicated as zero in the M/PID field. The above code worked with a few errors: 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5 sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880
On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Try this:
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
It is assumed that when either parent is not available the M/PID is 0.
Best,
Jorge.-
On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother.... Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
<kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Perhaps I am missing something but I do not get the same result:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
2702 349 mother
2702 3456 sibling
2702 9980 sibling
3064 3 father
3064 4 mother
3064 5 sibling
3064 86 sibling
3064 87 sibling"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
#Family.ID Sample.ID Relationship MID PID
#2702.1 2702 349 mother 0 0
#2702.2 2702 3456 sibling 349 0
#2702.3 2702 9980 sibling 349 0
#3064.4 3064 3 father 0 0
#3064.5 3064 4 mother 0 0
#3064.6 3064 5 sibling 4 3
#3064.7 3064 86 sibling 4 3
#3064.8 3064 87 sibling 4 3
HTH,
Jorge.-
On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Yep - you're right - missing parents are indicated as zero in the M/PID field. The above code worked with a few errors: 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5 sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Try this:
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
It is assumed that when either parent is not available the M/PID is 0.
Best,
Jorge.-
On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius <kate.ignatius at gmail.com
wrote:
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother.... Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com
wrote:
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
<kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate
any
help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Actually - your code is not wrong... because this is a large file I went through the file to see if there was anything wrong with it - looks like there are two fathers or three mothers in some families. Taking these duplicates out fixed the problem. Sorry about the confusion! And thanks so much for your help!
On Sat, Aug 16, 2014 at 9:53 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Perhaps I am missing something but I do not get the same result:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
2702 349 mother
2702 3456 sibling
2702 9980 sibling
3064 3 father
3064 4 mother
3064 5 sibling
3064 86 sibling
3064 87 sibling"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
#Family.ID Sample.ID Relationship MID PID
#2702.1 2702 349 mother 0 0
#2702.2 2702 3456 sibling 349 0
#2702.3 2702 9980 sibling 349 0
#3064.4 3064 3 father 0 0
#3064.5 3064 4 mother 0 0
#3064.6 3064 5 sibling 4 3
#3064.7 3064 86 sibling 4 3
#3064.8 3064 87 sibling 4 3
HTH,
Jorge.-
On Sun, Aug 17, 2014 at 11:47 AM, Kate Ignatius <kate.ignatius at gmail.com>
wrote:
Yep - you're right - missing parents are indicated as zero in the M/PID field. The above code worked with a few errors: 1: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 2: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 3: In l$PID[l$Relationship == "sibling"] <- l$Sample.ID[father] : number of items to replace is not a multiple of replacement length 4: In l$MID[l$Relationship == "sibling"] <- l$Sample.ID[mother] : number of items to replace is not a multiple of replacement length looking at the output I get numbers where the father/mother ID should be in the M/PID field. For example: 2702 349 mother 0 0 2702 3456 sibling 0 842 2702 9980 sibling 0 842 3064 3 father 0 0 3064 4 mother 0 0 3064 5 sibling 879 880 3064 86 sibling 879 880 3064 87 sibling 879 880 On Sat, Aug 16, 2014 at 9:31 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Try this:
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
if(sum(father) == 0)
l$PID[l$Relationship == 'sibling'] <- 0
else l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
if(sum(mother) == 0)
l$MID[l$Relationship == 'sibling'] <- 0
else l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
It is assumed that when either parent is not available the M/PID is 0.
Best,
Jorge.-
On Sun, Aug 17, 2014 at 10:58 AM, Kate Ignatius
<kate.ignatius at gmail.com>
wrote:
Actually - I didn't check this before, but these are not all nuclear families (as I assumed they were). That is, some don't have a father or don't have a mother.... Usually if this is the case PID or MID will become 0, respectively, for the child. How can the code be edit to account for this? On Sat, Aug 16, 2014 at 8:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Thanks! I think I know what is being done here but not sure how to fix the following error: Error in l$PID[l$\Relationship == "sibling"] <- l$Sample.ID[father] : replacement has length zero On Sat, Aug 16, 2014 at 6:48 PM, Jorge I Velez <jorgeivanvelez at gmail.com> wrote:
Dear Kate,
Assuming you have nuclear families, one option would be:
x <- read.table(textConnection("Family.ID Sample.ID Relationship
14 62 sibling
14 94 father
14 63 sibling
14 59 mother
17 6004 father
17 6003 mother
17 6005 sibling
17 368 sibling
130 202 mother
130 203 father
130 204 sibling
130 205 sibling
130 206 sibling
222 9 mother
222 45 sibling
222 34 sibling
222 10 sibling
222 11 sibling
222 18 father"), header = TRUE)
closeAllConnections()
xs <- with(x, split(x, Family.ID))
res <- do.call(rbind, lapply(xs, function(l){
l$PID <- l$MID <- 0
father <- with(l, Relationship == 'father')
mother <- with(l, Relationship == 'mother')
l$PID[l$Relationship == 'sibling'] <- l$Sample.ID[father]
l$MID[l$Relationship == 'sibling'] <- l$Sample.ID[mother]
l
}))
res
HTH,
Jorge.-
Best regards,
Jorge.-
On Sun, Aug 17, 2014 at 5:42 AM, Kate Ignatius
<kate.ignatius at gmail.com>
wrote:
Hi, I have a data.table question (as well as if else statement query). I have a large list of families (file has 935 individuals that are sorted by famiy of varying sizes). At the moment the file has the columns: SampleID FamilyID Relationship To prevent from having to make a pedigree file by hand - ie adding a PaternalID and a MaternalID one by one I want to try write a script that will quickly do this for me (I eventually want to run this through a program such as plink) Is there a way to use data.table (maybe in conjucntion with ifelse to do this effectively)? An example of the file is something like: Family.ID Sample.ID Relationship 14 62 sibling 14 94 father 14 63 sibling 14 59 mother 17 6004 father 17 6003 mother 17 6005 sibling 17 368 sibling 130 202 mother 130 203 father 130 204 sibling 130 205 sibling 130 206 sibling 222 9 mother 222 45 sibling 222 34 sibling 222 10 sibling 222 11 sibling 222 18 father But the goal is to have a file like this: Family.ID Sample.ID Relationship PID MID 14 62 sibling 94 59 14 94 father 0 0 14 63 sibling 94 59 14 59 mother 0 0 17 6004 father 0 0 17 6003 mother 0 0 17 6005 sibling 6004 6003 17 368 sibling 6004 6003 130 202 mother 0 0 130 203 father 0 0 130 204 sibling 203 202 130 205 sibling 203 202 130 206 sibling 203 202 222 9 mother 0 0 222 45 sibling 18 9 222 34 sibling 18 9 222 10 sibling 18 9 222 11 sibling 18 9 222 18 father 0 0 I've tried searches for this but with no luck. Greatly appreciate any help - even if its just a link to a great example/solution! Thanks!
______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Sat, Aug 16, 2014 at 9:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Actually - your code is not wrong... because this is a large file I went through the file to see if there was anything wrong with it - looks like there are two fathers or three mothers in some families. Taking these duplicates out fixed the problem. Sorry about the confusion! And thanks so much for your help!
Kate, I hope you don't mind, but I have a curiosity question on my part. Were the families with multiple fathers or mothers a mistake, just duplicates (same Family.ID & Sample.ID), or more like an "intermixed" family due to divorce and remarriage. Or even, like in some countries, a case of polygamy? Sorry, I just get curious about the strangest things sometimes.
There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown [[alternative HTML version deleted]]
On 17-Aug-2014 03:50:33 John McKown wrote:
On Sat, Aug 16, 2014 at 9:02 PM, Kate Ignatius <kate.ignatius at gmail.com> wrote:
Actually - your code is not wrong... because this is a large file I went through the file to see if there was anything wrong with it - looks like there are two fathers or three mothers in some families. Taking these duplicates out fixed the problem. Sorry about the confusion! And thanks so much for your help!
Kate, I hope you don't mind, but I have a curiosity question on my part. Were the families with multiple fathers or mothers a mistake, just duplicates (same Family.ID & Sample.ID), or more like an "intermixed" family due to divorce and remarriage. Or even, like in some countries, a case of polygamy? Sorry, I just get curious about the strangest things sometimes. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown
When Kate first posted her query, similar thoughts to John's occurred
to me. The potential for convoluted ancestry and kinship is enormous!
For perhaps (or perhaps not) ultimate convolution, try reconstructing
a canine pedigree from a breeding register of thoroughbreds, where
again the primary data is for each individual is
* ID of individual
* ID of litter the individual was born in ("family")
* ID of male parent
* ID of female parent
(as, for instance, registered with the UK Kennel Club).
Similar convolutions can be found with race-horses.
But even humans can compete. Here is a little challenge for anyone
who has an R program that will work out a pedigree from data such as
described above. I have used Kate's notation. Individuals are numbered
from 1 up (with a gap): Sample.ID; Families from 101 up: Family.ID.
Relationships are "sibling", "father", "mother".
ID for father/mother may be "NA" (data not given).
Family.ID Sample.ID Relationship
101 01 sibling
101 02 father
101 03 mother
102 02 sibling
102 04 father
102 05 mother
103 03 sibling
103 06 father
103 07 mother
104 04 sibling
104 08 father
104 09 mother
104 05 sibling
104 08 father
104 09 mother
104 06 sibling
104 08 father
104 09 mother
104 15 sibling
104 08 father
104 09 mother
105 07 sibling
105 04 father
105 15 mother
106 08 sibling
106 16 father
106 17 mother
106 18 sibling
106 16 father
106 17 mother
106 19 sibling
106 16 father
106 17 mother
107 09 sibling
107 18 father
107 19 mother
108 16 sibling
108 NA father
108 NA mother
109 17 sibling
109 NA father
109 NA mother
That's the data. Now a little quiz question: Can you guess the
identity of the person with sample.ID = 01 ?
Best wishes to all,
Ted.
-------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
Date: 17-Aug-2014 Time: 19:41:38
This message was sent by XFMail