tDate tTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator 1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1 2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1 3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2 4 19980104 2400 0.034 0.023 30.7 679 7.0 338 NA 2 5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1 6 19980106 2400 0.021 0.018 29.9 603 9.3 356 NA 1 7 19980107 2400 0.026 0.047 31.2 857 10.7 336 NA 1 8 19980108 2400 0.024 0.014 31.1 635 7.8 330 NA 1 9 19980109 2400 0.058 0.033 32.5 742 10.7 334 NA 1 10 19980110 2400 0.026 0.032 33.9 923 10.6 347 NA 2 11 19980111 2400 0.064 0.034 32.5 751 6.3 355 NA 2 12 19980112 2400 0.066 0.034 33.3 697 8.5 319 NA 1 13 19980113 2400 0.026 0.030 33.4 992 12.5 341 NA 1 14 19980114 2400 0.101 0.028 33.8 705 8.7 349 NA 1 15 19980115 2400 0.069 0.030 33.3 718 11.4 348 NA 1 16 19980116 2400 0.054 0.026 33.4 639 10.9 354 NA 1 17 19980117 2400 0.090 0.039 33.1 653 13.2 342 NA 2 18 19980118 2400 0.048 0.017 33.2 825 10.8 323 NA 2 19 19980119 2400 0.038 0.027 33.7 984 10.3 353 NA 1 20 19980120 2400 0.026 0.032 34.2 994 15.0 357 NA 1 21 19980121 2400 0.065 0.044 33.8 999 17.5 343 NA 1 22 19980122 2400 0.046 0.024 33.5 931 10.1 332 NA 1 23 19980123 2400 0.050 0.041 33.9 881 11.3 353 NA 1 24 19980124 2400 0.036 0.027 33.8 877 9.1 328 NA 2 25 19980125 2400 0.043 0.021 33.2 777 10.5 340 NA 2 26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1 27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1 28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1 29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1 30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1 if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed randomly??(missing values in these rows & columns) -- View this message in context: http://r.789695.n4.nabble.com/HELP-how-to-remove-10-of-data-randomly-in-R-tp4647879p4647994.html Sent from the R help mailing list archive at Nabble.com.
HELP!! how to remove 10% of data randomly in R
5 messages · Eugenie, Nordlund, Dan (DSHS/RDA), David Winsemius +1 more
-----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- project.org] On Behalf Of Eugenie Sent: Wednesday, October 31, 2012 5:42 AM To: r-help at r-project.org Subject: Re: [R] HELP!! how to remove 10% of data randomly in R tDate tTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator 1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1 2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1 3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2 4 19980104 2400 0.034 0.023 30.7 679 7.0 338 NA 2 5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1 6 19980106 2400 0.021 0.018 29.9 603 9.3 356 NA 1 7 19980107 2400 0.026 0.047 31.2 857 10.7 336 NA 1 8 19980108 2400 0.024 0.014 31.1 635 7.8 330 NA 1 9 19980109 2400 0.058 0.033 32.5 742 10.7 334 NA 1 10 19980110 2400 0.026 0.032 33.9 923 10.6 347 NA 2 11 19980111 2400 0.064 0.034 32.5 751 6.3 355 NA 2 12 19980112 2400 0.066 0.034 33.3 697 8.5 319 NA 1 13 19980113 2400 0.026 0.030 33.4 992 12.5 341 NA 1 14 19980114 2400 0.101 0.028 33.8 705 8.7 349 NA 1 15 19980115 2400 0.069 0.030 33.3 718 11.4 348 NA 1 16 19980116 2400 0.054 0.026 33.4 639 10.9 354 NA 1 17 19980117 2400 0.090 0.039 33.1 653 13.2 342 NA 2 18 19980118 2400 0.048 0.017 33.2 825 10.8 323 NA 2 19 19980119 2400 0.038 0.027 33.7 984 10.3 353 NA 1 20 19980120 2400 0.026 0.032 34.2 994 15.0 357 NA 1 21 19980121 2400 0.065 0.044 33.8 999 17.5 343 NA 1 22 19980122 2400 0.046 0.024 33.5 931 10.1 332 NA 1 23 19980123 2400 0.050 0.041 33.9 881 11.3 353 NA 1 24 19980124 2400 0.036 0.027 33.8 877 9.1 328 NA 2 25 19980125 2400 0.043 0.021 33.2 777 10.5 340 NA 2 26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1 27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1 28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1 29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1 30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1 if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed randomly??(missing values in these rows & columns)
If your data frame is called dat, then something like this may do what you want, but since you haven't given an example of what you want the output to look like I am only guessing. dat[sample(1:30,5),3:8] <- NA hope this is helpful, Dan Daniel J. Nordlund Olympia, WA 98504-5204
On Oct 31, 2012, at 5:42 AM, Eugenie wrote:
tDate tTime O3 No2 Temp Sun Wspeed Wdirect Hum Indicator 1 19980101 2400 0.065 0.036 31.4 765 9.9 351 NA 1 2 19980102 2400 0.053 0.025 31.8 624 7.7 351 NA 1 3 19980103 2400 0.027 0.033 31.5 852 8.8 331 NA 2 4 19980104 2400 0.034 0.023 30.7 679 7.0 338 NA 2 5 19980105 2400 0.019 0.016 28.1 376 9.6 354 NA 1 6 19980106 2400 0.021 0.018 29.9 603 9.3 356 NA 1 7 19980107 2400 0.026 0.047 31.2 857 10.7 336 NA 1 8 19980108 2400 0.024 0.014 31.1 635 7.8 330 NA 1 9 19980109 2400 0.058 0.033 32.5 742 10.7 334 NA 1 10 19980110 2400 0.026 0.032 33.9 923 10.6 347 NA 2 11 19980111 2400 0.064 0.034 32.5 751 6.3 355 NA 2 12 19980112 2400 0.066 0.034 33.3 697 8.5 319 NA 1 13 19980113 2400 0.026 0.030 33.4 992 12.5 341 NA 1 14 19980114 2400 0.101 0.028 33.8 705 8.7 349 NA 1 15 19980115 2400 0.069 0.030 33.3 718 11.4 348 NA 1 16 19980116 2400 0.054 0.026 33.4 639 10.9 354 NA 1 17 19980117 2400 0.090 0.039 33.1 653 13.2 342 NA 2 18 19980118 2400 0.048 0.017 33.2 825 10.8 323 NA 2 19 19980119 2400 0.038 0.027 33.7 984 10.3 353 NA 1 20 19980120 2400 0.026 0.032 34.2 994 15.0 357 NA 1 21 19980121 2400 0.065 0.044 33.8 999 17.5 343 NA 1 22 19980122 2400 0.046 0.024 33.5 931 10.1 332 NA 1 23 19980123 2400 0.050 0.041 33.9 881 11.3 353 NA 1 24 19980124 2400 0.036 0.027 33.8 877 9.1 328 NA 2 25 19980125 2400 0.043 0.021 33.2 777 10.5 340 NA 2 26 19980126 2400 0.029 0.016 33.1 999 14.1 341 NA 1 27 19980127 2400 0.033 0.030 33.9 943 12.9 344 NA 1 28 19980128 2400 0.040 0.022 33.7 805 12.6 354 NA 1 29 19980129 2400 0.029 0.015 30.2 512 7.4 356 NA 1 30 19980130 2400 0.027 0.013 31.7 656 13.9 349 NA 1 if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed randomly??(missing values in these rows & columns)
Not clear whether those entries are to be NA or that you wanted a reduced size dataframe. Perhaps:
is.na(dfrm[ sample(1:NROW(dfrm) , c('O3','NO2','sun','temp','wspeed')]) <- TRUE
Note that the spellings of your column names and specified targets are not the same, and so there is a further problem with you problem specification.
David Winsemius, MD Alameda, CA, USA
HI, May be this helps. dat1<-read.table(text=" ? TDate TTime??? O3? No2 Temp Sun Wspeed Wdirect Hum Indicator 1??? 19980101? 2400 0.065 0.036 31.4 765??? 9.9??? 351? NA??????? 1 2??? 19980102? 2400 0.053 0.025 31.8 624??? 7.7??? 351? NA??????? 1 3??? 19980103? 2400 0.027 0.033 31.5 852??? 8.8??? 331? NA??????? 2 4??? 19980104? 2400 0.034 0.023 30.7 679??? 7.0??? 338? NA??????? 2 5??? 19980105? 2400 0.019 0.016 28.1 376??? 9.6??? 354? NA??????? 1 6??? 19980106? 2400 0.021 0.018 29.9 603??? 9.3??? 356? NA??????? 1 7??? 19980107? 2400 0.026 0.047 31.2 857? 10.7??? 336? NA??????? 1 8??? 19980108? 2400 0.024 0.014 31.1 635??? 7.8??? 330? NA??????? 1 9??? 19980109? 2400 0.058 0.033 32.5 742? 10.7??? 334? NA??????? 1 10? 19980110? 2400 0.026 0.032 33.9 923? 10.6??? 347? NA??????? 2 11? 19980111? 2400 0.064 0.034 32.5 751??? 6.3??? 355? NA??????? 2 12? 19980112? 2400 0.066 0.034 33.3 697??? 8.5??? 319? NA??????? 1 13? 19980113? 2400 0.026 0.030 33.4 992? 12.5??? 341? NA??????? 1 14? 19980114? 2400 0.101 0.028 33.8 705??? 8.7??? 349? NA??????? 1 15? 19980115? 2400 0.069 0.030 33.3 718? 11.4??? 348? NA??????? 1 16? 19980116? 2400 0.054 0.026 33.4 639? 10.9??? 354? NA??????? 1 17? 19980117? 2400 0.090 0.039 33.1 653? 13.2??? 342? NA??????? 2 18? 19980118? 2400 0.048 0.017 33.2 825? 10.8??? 323? NA??????? 2 19? 19980119? 2400 0.038 0.027 33.7 984? 10.3??? 353? NA??????? 1 20? 19980120? 2400 0.026 0.032 34.2 994? 15.0??? 357? NA??????? 1 21? 19980121? 2400 0.065 0.044 33.8 999? 17.5??? 343? NA??????? 1 22? 19980122? 2400 0.046 0.024 33.5 931? 10.1??? 332? NA??????? 1 23? 19980123? 2400 0.050 0.041 33.9 881? 11.3??? 353? NA??????? 1 24? 19980124? 2400 0.036 0.027 33.8 877??? 9.1??? 328? NA??????? 2 25? 19980125? 2400 0.043 0.021 33.2 777? 10.5??? 340? NA??????? 2 26? 19980126? 2400 0.029 0.016 33.1 999? 14.1??? 341? NA??????? 1 27? 19980127? 2400 0.033 0.030 33.9 943? 12.9??? 344? NA??????? 1 28? 19980128? 2400 0.040 0.022 33.7 805? 12.6??? 354? NA??????? 1 29? 19980129? 2400 0.029 0.015 30.2 512??? 7.4??? 356? NA??????? 1 30? 19980130? 2400 0.027 0.013 31.7 656? 13.9??? 349? NA??????? 1 ",sep="",header=TRUE,stringsAsFactors=FALSE) #creating NA for 10% of data in the specified columns (deviant of David's method).? is.na(dat1[sample(1:nrow(dat1),0.1*nrow(dat1)),3:7])<-TRUE tail(dat1) #????? TDate TTime??? O3?? No2 Temp Sun Wspeed Wdirect Hum Indicator #25 19980125? 2400??? NA??? NA?? NA? NA???? NA???? 340? NA???????? 2 #26 19980126? 2400 0.029 0.016 33.1 999?? 14.1???? 341? NA???????? 1 #27 19980127? 2400 0.033 0.030 33.9 943?? 12.9???? 344? NA???????? 1 #28 19980128? 2400 0.040 0.022 33.7 805?? 12.6???? 354? NA???????? 1 #29 19980129? 2400 0.029 0.015 30.2 512??? 7.4???? 356? NA???????? 1 #30 19980130? 2400 0.027 0.013 31.7 656?? 13.9???? 349? NA???????? 1 #If you need to create NA for individual columns randomly res<-do.call(cbind,lapply(lapply(dat1[,3:7],function(x) data.frame(x)),function(x) x[sample(1:nrow(x),0.1*nrow(x)),])) dat1[,3][dat1[,3]%in%res[,1]]<-NA ?dat1[,4][dat1[,4]%in%res[,2]]<-NA ?dat1[,5][dat1[,5]%in%res[,3]]<-NA ?dat1[,6][dat1[,6]%in%res[,4]]<-NA dat1[,7][dat1[,7]%in%res[,5]]<-NA ?head(dat1) #???? TDate TTime??? O3?? No2 Temp Sun Wspeed Wdirect Hum Indicator #1 19980101? 2400 0.065 0.036 31.4 765??? 9.9???? 351? NA???????? 1 #2 19980102? 2400 0.053 0.025 31.8 624??? 7.7???? 351? NA???????? 1 #3 19980103? 2400 0.027 0.033 31.5 852??? 8.8???? 331? NA???????? 2 #4 19980104? 2400??? NA??? NA 30.7 679??? 7.0???? 338? NA???????? 2 #5 19980105? 2400 0.019 0.016 28.1 376??? 9.6???? 354? NA???????? 1 #6 19980106? 2400 0.021 0.018 29.9 603???? NA???? 356? NA???????? 1 A.K. ----- Original Message ----- From: Eugenie <leemeanwei at hotmail.com> To: r-help at r-project.org Cc: Sent: Wednesday, October 31, 2012 8:42 AM Subject: Re: [R] HELP!! how to remove 10% of data randomly in R ? tDate tTime? ? O3? No2 Temp Sun Wspeed Wdirect Hum Indicator 1? ? 19980101? 2400 0.065 0.036 31.4 765? ? 9.9? ? 351? NA? ? ? ? 1 2? ? 19980102? 2400 0.053 0.025 31.8 624? ? 7.7? ? 351? NA? ? ? ? 1 3? ? 19980103? 2400 0.027 0.033 31.5 852? ? 8.8? ? 331? NA? ? ? ? 2 4? ? 19980104? 2400 0.034 0.023 30.7 679? ? 7.0? ? 338? NA? ? ? ? 2 5? ? 19980105? 2400 0.019 0.016 28.1 376? ? 9.6? ? 354? NA? ? ? ? 1 6? ? 19980106? 2400 0.021 0.018 29.9 603? ? 9.3? ? 356? NA? ? ? ? 1 7? ? 19980107? 2400 0.026 0.047 31.2 857? 10.7? ? 336? NA? ? ? ? 1 8? ? 19980108? 2400 0.024 0.014 31.1 635? ? 7.8? ? 330? NA? ? ? ? 1 9? ? 19980109? 2400 0.058 0.033 32.5 742? 10.7? ? 334? NA? ? ? ? 1 10? 19980110? 2400 0.026 0.032 33.9 923? 10.6? ? 347? NA? ? ? ? 2 11? 19980111? 2400 0.064 0.034 32.5 751? ? 6.3? ? 355? NA? ? ? ? 2 12? 19980112? 2400 0.066 0.034 33.3 697? ? 8.5? ? 319? NA? ? ? ? 1 13? 19980113? 2400 0.026 0.030 33.4 992? 12.5? ? 341? NA? ? ? ? 1 14? 19980114? 2400 0.101 0.028 33.8 705? ? 8.7? ? 349? NA? ? ? ? 1 15? 19980115? 2400 0.069 0.030 33.3 718? 11.4? ? 348? NA? ? ? ? 1 16? 19980116? 2400 0.054 0.026 33.4 639? 10.9? ? 354? NA? ? ? ? 1 17? 19980117? 2400 0.090 0.039 33.1 653? 13.2? ? 342? NA? ? ? ? 2 18? 19980118? 2400 0.048 0.017 33.2 825? 10.8? ? 323? NA? ? ? ? 2 19? 19980119? 2400 0.038 0.027 33.7 984? 10.3? ? 353? NA? ? ? ? 1 20? 19980120? 2400 0.026 0.032 34.2 994? 15.0? ? 357? NA? ? ? ? 1 21? 19980121? 2400 0.065 0.044 33.8 999? 17.5? ? 343? NA? ? ? ? 1 22? 19980122? 2400 0.046 0.024 33.5 931? 10.1? ? 332? NA? ? ? ? 1 23? 19980123? 2400 0.050 0.041 33.9 881? 11.3? ? 353? NA? ? ? ? 1 24? 19980124? 2400 0.036 0.027 33.8 877? ? 9.1? ? 328? NA? ? ? ? 2 25? 19980125? 2400 0.043 0.021 33.2 777? 10.5? ? 340? NA? ? ? ? 2 26? 19980126? 2400 0.029 0.016 33.1 999? 14.1? ? 341? NA? ? ? ? 1 27? 19980127? 2400 0.033 0.030 33.9 943? 12.9? ? 344? NA? ? ? ? 1 28? 19980128? 2400 0.040 0.022 33.7 805? 12.6? ? 354? NA? ? ? ? 1 29? 19980129? 2400 0.029 0.015 30.2 512? ? 7.4? ? 356? NA? ? ? ? 1 30? 19980130? 2400 0.027 0.013 31.7 656? 13.9? ? 349? NA? ? ? ? 1 if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed randomly??(missing values in these rows & columns) -- View this message in context: http://r.789695.n4.nabble.com/HELP-how-to-remove-10-of-data-randomly-in-R-tp4647879p4647994.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
thanks all.your help much appreciated. and its usefull -- View this message in context: http://r.789695.n4.nabble.com/HELP-how-to-remove-10-of-data-randomly-in-R-tp4647879p4648098.html Sent from the R help mailing list archive at Nabble.com.