Skip to content
Prev 393771 / 398503 Next

Covid Mutations: Cumulative?

Dear R-Users,

Did anyone follow more closely the SARS Cov-2 lineages?

I have done a quick check of Cov-2 mutations on the list downloaded from 
NCBI (see GitHub page below); but it seems that the list contains the 
cumulative mutations only for B.1 => B.1.1, but not after the B.1.1 branch:
# B.1 => B.1.1 seems cumulative
diff.lineage("B.1.1", "B.1", data=z)
# but B.1.1 => B.1.1.529 is NOT cumulative anymore;
diff.lineage("B.1.1.529", "B.1.1", data=z)
diff.lineage("B.1.1.529", "BA.2", data=z)
diff.lineage("B.1.1.529", "BA.5", data=z)

# Column id: B(oth) = present in both lineages:
 ??????? V?? Mutation??? P??? AA Pos AAi AAm Polymorphism id
899 B.1.1 nsp3:F106F nsp3 F106F 106?? F F???????? TRUE? B
900 B.1.1 RdRp:P323L RdRp P323L 323?? P L??????? FALSE? B
901 B.1.1??? S:D614G??? S D614G 614?? D G??????? FALSE? B
902 B.1.1??? N:R203K??? N R203K 203?? R K??????? FALSE? 1
903 B.1.1??? N:R203R??? N R203R 203?? R R???????? TRUE? 1
904 B.1.1??? N:G204R??? N G204R 204?? G R??????? FALSE? 1
896?? B.1 nsp3:F106F nsp3 F106F 106?? F F???????? TRUE? B
897?? B.1 RdRp:P323L RdRp P323L 323?? P L??????? FALSE? B
898?? B.1??? S:D614G??? S D614G 614?? D G??????? FALSE? B
# B.1.1.529 and branches do not have any of the defining mutations of B.1.1;

I have uploaded the code on GitHub:
https://github.com/discoleo/R/blob/master/Stat/Infx/Cov2.Variants.R

1.) Does anyone have a better picture of what is going on?
The sub-variants should have cumulative mutations. This should be the 
logic for the sub-lineages and I deduce it also by the data/post on the 
GitHub pango page:
https://github.com/cov-lineages/pango-designation/issues/361


2.) Cumulative List

It maybe that NCBI kept only the new mutations, as the number of 
mutations increased.


Does anyone know if there is a full cumulative list?

Alternatively, there might be a list or package with the full lineage 
encoding. There is a list on the Pango GitHub project, but I hope to 
skip at least this step; the synonyms in the NCBI file seem uglier to 
process.


Note:

This question may be more oriented towards Bioconductor; but I haven't 
found any real Covid packages on Bioconductor.


Thank you very much for any help.


Sincerely,


Leonard