Skip to content
Back to formatted view

Raw Message

Message-ID: <1351709319.78380.YahooMailNeo@web142601.mail.bf1.yahoo.com>
Date: 2012-10-31T18:48:39Z
From: arun
Subject: HELP!! how to remove 10% of data randomly in R
In-Reply-To: <1351687340585-4647994.post@n4.nabble.com>

HI,

May be this helps.


dat1<-read.table(text="
? TDate TTime??? O3? No2 Temp Sun Wspeed Wdirect Hum Indicator
1??? 19980101? 2400 0.065 0.036 31.4 765??? 9.9??? 351? NA??????? 1
2??? 19980102? 2400 0.053 0.025 31.8 624??? 7.7??? 351? NA??????? 1
3??? 19980103? 2400 0.027 0.033 31.5 852??? 8.8??? 331? NA??????? 2
4??? 19980104? 2400 0.034 0.023 30.7 679??? 7.0??? 338? NA??????? 2
5??? 19980105? 2400 0.019 0.016 28.1 376??? 9.6??? 354? NA??????? 1
6??? 19980106? 2400 0.021 0.018 29.9 603??? 9.3??? 356? NA??????? 1
7??? 19980107? 2400 0.026 0.047 31.2 857? 10.7??? 336? NA??????? 1
8??? 19980108? 2400 0.024 0.014 31.1 635??? 7.8??? 330? NA??????? 1
9??? 19980109? 2400 0.058 0.033 32.5 742? 10.7??? 334? NA??????? 1
10? 19980110? 2400 0.026 0.032 33.9 923? 10.6??? 347? NA??????? 2
11? 19980111? 2400 0.064 0.034 32.5 751??? 6.3??? 355? NA??????? 2
12? 19980112? 2400 0.066 0.034 33.3 697??? 8.5??? 319? NA??????? 1
13? 19980113? 2400 0.026 0.030 33.4 992? 12.5??? 341? NA??????? 1
14? 19980114? 2400 0.101 0.028 33.8 705??? 8.7??? 349? NA??????? 1
15? 19980115? 2400 0.069 0.030 33.3 718? 11.4??? 348? NA??????? 1
16? 19980116? 2400 0.054 0.026 33.4 639? 10.9??? 354? NA??????? 1
17? 19980117? 2400 0.090 0.039 33.1 653? 13.2??? 342? NA??????? 2
18? 19980118? 2400 0.048 0.017 33.2 825? 10.8??? 323? NA??????? 2
19? 19980119? 2400 0.038 0.027 33.7 984? 10.3??? 353? NA??????? 1
20? 19980120? 2400 0.026 0.032 34.2 994? 15.0??? 357? NA??????? 1
21? 19980121? 2400 0.065 0.044 33.8 999? 17.5??? 343? NA??????? 1
22? 19980122? 2400 0.046 0.024 33.5 931? 10.1??? 332? NA??????? 1
23? 19980123? 2400 0.050 0.041 33.9 881? 11.3??? 353? NA??????? 1
24? 19980124? 2400 0.036 0.027 33.8 877??? 9.1??? 328? NA??????? 2
25? 19980125? 2400 0.043 0.021 33.2 777? 10.5??? 340? NA??????? 2
26? 19980126? 2400 0.029 0.016 33.1 999? 14.1??? 341? NA??????? 1
27? 19980127? 2400 0.033 0.030 33.9 943? 12.9??? 344? NA??????? 1
28? 19980128? 2400 0.040 0.022 33.7 805? 12.6??? 354? NA??????? 1
29? 19980129? 2400 0.029 0.015 30.2 512??? 7.4??? 356? NA??????? 1
30? 19980130? 2400 0.027 0.013 31.7 656? 13.9??? 349? NA??????? 1
",sep="",header=TRUE,stringsAsFactors=FALSE)

#creating NA for 10% of data in the specified columns (deviant of David's method).? 
is.na(dat1[sample(1:nrow(dat1),0.1*nrow(dat1)),3:7])<-TRUE
tail(dat1)
#????? TDate TTime??? O3?? No2 Temp Sun Wspeed Wdirect Hum Indicator
#25 19980125? 2400??? NA??? NA?? NA? NA???? NA???? 340? NA???????? 2
#26 19980126? 2400 0.029 0.016 33.1 999?? 14.1???? 341? NA???????? 1
#27 19980127? 2400 0.033 0.030 33.9 943?? 12.9???? 344? NA???????? 1
#28 19980128? 2400 0.040 0.022 33.7 805?? 12.6???? 354? NA???????? 1
#29 19980129? 2400 0.029 0.015 30.2 512??? 7.4???? 356? NA???????? 1
#30 19980130? 2400 0.027 0.013 31.7 656?? 13.9???? 349? NA???????? 1

#If you need to create NA for individual columns randomly
res<-do.call(cbind,lapply(lapply(dat1[,3:7],function(x) data.frame(x)),function(x) x[sample(1:nrow(x),0.1*nrow(x)),]))
dat1[,3][dat1[,3]%in%res[,1]]<-NA
?dat1[,4][dat1[,4]%in%res[,2]]<-NA
?dat1[,5][dat1[,5]%in%res[,3]]<-NA
?dat1[,6][dat1[,6]%in%res[,4]]<-NA
dat1[,7][dat1[,7]%in%res[,5]]<-NA
?head(dat1)
#???? TDate TTime??? O3?? No2 Temp Sun Wspeed Wdirect Hum Indicator
#1 19980101? 2400 0.065 0.036 31.4 765??? 9.9???? 351? NA???????? 1
#2 19980102? 2400 0.053 0.025 31.8 624??? 7.7???? 351? NA???????? 1
#3 19980103? 2400 0.027 0.033 31.5 852??? 8.8???? 331? NA???????? 2
#4 19980104? 2400??? NA??? NA 30.7 679??? 7.0???? 338? NA???????? 2
#5 19980105? 2400 0.019 0.016 28.1 376??? 9.6???? 354? NA???????? 1
#6 19980106? 2400 0.021 0.018 29.9 603???? NA???? 356? NA???????? 1

A.K.

----- Original Message -----
From: Eugenie <leemeanwei at hotmail.com>
To: r-help at r-project.org
Cc: 
Sent: Wednesday, October 31, 2012 8:42 AM
Subject: Re: [R] HELP!! how to remove 10% of data randomly in R

? tDate tTime? ? O3?  No2 Temp Sun Wspeed Wdirect Hum Indicator
1? ? 19980101? 2400 0.065 0.036 31.4 765? ? 9.9? ?  351? NA? ? ? ?  1
2? ? 19980102? 2400 0.053 0.025 31.8 624? ? 7.7? ?  351? NA? ? ? ?  1
3? ? 19980103? 2400 0.027 0.033 31.5 852? ? 8.8? ?  331? NA? ? ? ?  2
4? ? 19980104? 2400 0.034 0.023 30.7 679? ? 7.0? ?  338? NA? ? ? ?  2
5? ? 19980105? 2400 0.019 0.016 28.1 376? ? 9.6? ?  354? NA? ? ? ?  1
6? ? 19980106? 2400 0.021 0.018 29.9 603? ? 9.3? ?  356? NA? ? ? ?  1
7? ? 19980107? 2400 0.026 0.047 31.2 857?  10.7? ?  336? NA? ? ? ?  1
8? ? 19980108? 2400 0.024 0.014 31.1 635? ? 7.8? ?  330? NA? ? ? ?  1
9? ? 19980109? 2400 0.058 0.033 32.5 742?  10.7? ?  334? NA? ? ? ?  1
10?  19980110? 2400 0.026 0.032 33.9 923?  10.6? ?  347? NA? ? ? ?  2
11?  19980111? 2400 0.064 0.034 32.5 751? ? 6.3? ?  355? NA? ? ? ?  2
12?  19980112? 2400 0.066 0.034 33.3 697? ? 8.5? ?  319? NA? ? ? ?  1
13?  19980113? 2400 0.026 0.030 33.4 992?  12.5? ?  341? NA? ? ? ?  1
14?  19980114? 2400 0.101 0.028 33.8 705? ? 8.7? ?  349? NA? ? ? ?  1
15?  19980115? 2400 0.069 0.030 33.3 718?  11.4? ?  348? NA? ? ? ?  1
16?  19980116? 2400 0.054 0.026 33.4 639?  10.9? ?  354? NA? ? ? ?  1
17?  19980117? 2400 0.090 0.039 33.1 653?  13.2? ?  342? NA? ? ? ?  2
18?  19980118? 2400 0.048 0.017 33.2 825?  10.8? ?  323? NA? ? ? ?  2
19?  19980119? 2400 0.038 0.027 33.7 984?  10.3? ?  353? NA? ? ? ?  1
20?  19980120? 2400 0.026 0.032 34.2 994?  15.0? ?  357? NA? ? ? ?  1
21?  19980121? 2400 0.065 0.044 33.8 999?  17.5? ?  343? NA? ? ? ?  1
22?  19980122? 2400 0.046 0.024 33.5 931?  10.1? ?  332? NA? ? ? ?  1
23?  19980123? 2400 0.050 0.041 33.9 881?  11.3? ?  353? NA? ? ? ?  1
24?  19980124? 2400 0.036 0.027 33.8 877? ? 9.1? ?  328? NA? ? ? ?  2
25?  19980125? 2400 0.043 0.021 33.2 777?  10.5? ?  340? NA? ? ? ?  2
26?  19980126? 2400 0.029 0.016 33.1 999?  14.1? ?  341? NA? ? ? ?  1
27?  19980127? 2400 0.033 0.030 33.9 943?  12.9? ?  344? NA? ? ? ?  1
28?  19980128? 2400 0.040 0.022 33.7 805?  12.6? ?  354? NA? ? ? ?  1
29?  19980129? 2400 0.029 0.015 30.2 512? ? 7.4? ?  356? NA? ? ? ?  1
30?  19980130? 2400 0.027 0.013 31.7 656?  13.9? ?  349? NA? ? ? ?  1



if given data like this,how to remove the data in O3,NO2,sun,temp,wspeed
randomly??(missing values in these rows & columns)



--
View this message in context: http://r.789695.n4.nabble.com/HELP-how-to-remove-10-of-data-randomly-in-R-tp4647879p4647994.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.