Hi all;
I have a big data set (a small part is given below) and V1 column has repeated info in it. That is rs941873, rs12307687... are repeating many times. I need choose only one SNP (in first column named rs) which has the smallest ?Pvalue withing V1 column. That is I need choose only one SNP for repeated names in V1 which has the smallest Pvalue.
Your helps are truly appreciated,
?rs ? ? ? ? ? ? ? ? ? Chr V6 ? ? ? ? ?? A1? A2 ??Freq ??Effect ?StdErr ? ? ? ? Pvalue? V1 ? ? ? ? ?Gene?rs941873 chr10 81139462 a g 0.4117 -0.0541 0.0103 1.52E-07 rs941873 ? ? ???no_value?rs634552 chr11 75282052 t g 0.3735 0.0159 0.0099 1.08E-01 rs941873 SERPINH1?rs11107175 chr12 94161719 t c 0.0896 -0.0386 0.0176 2.85E-02 rs941873? CRADD?rs12307687 chr12 47175866 a t 0.7379 -0.0208 0.0135 1.23E-01 rs12307687 SLC38A4?rs3917155 chr14 76444685 c g 0.0495 0.0153 0.0371 6.80E-01 rs941873? TGFB3?rs1600640 chr15 84603034 t g 0.1791 -0.0448 0.0123 2.75E-04 rs12307687 ADAMTSL3?rs2871865 chr15 99194896 c g 0.5515 0.0191 0.0106 7.09E-02 rs12307687 IGF1R?rs2955250 chr17 61959740 t c 0.6945 0.0277 0.0129 3.17E-02 rs12307687 GH2?rs228758 chr17 42148205 t c 0.1222 -0.0265 0.015 7.72E-02 rs12307687 G6PC3?rs224333 chr20 34023962 a g 0.8606 0.0568 0.0246 2.10E-02 rs10071837 GDF5?rs4681725 chr3 56692321 t g 0.2362 0.0386 0.011 4.45E-04 rs10071837 C3orf63?rs7652177 chr3 ? 171969077 c g 0.1478 -0.0458 0.0134 6.34E-04 rs10071837 FNDC3B?rs925098 chr4 ? 17919811 a g 0.6529 -0.0563 0.0097 5.55E-09 rs925098 LCORL?rs1662837 chr4? 82168889 t c 0.2728 -0.0411 0.0105 8.66E-05 rs925098? no_value?rs10071837 chr5? 33381581 t c 0.424 -0.0324 0.0094 5.74E-04 rs925098? no_value