An embedded and charset-unspecified text was scrubbed... Name: not available Url: https://stat.ethz.ch/pipermail/r-help/attachments/20031224/b2bb2e2b/attachment.pl
coding logic and syntax in R
4 messages · Pravin, Brian Ripley, Eric Lecoutre +1 more
On Wed, 24 Dec 2003, Pravin wrote:
I am a beginner in R programming and recently heard about this mailing list. Currently, I am trapped into a simple problem for which I just can't find a solution. I have a huge dataset (~81,000 observations) that has been
BTW, that is quite a small dataset these days: not even 10 million is `huge'.
analyzed and the final result is in the form of 0 and 1(one column). I need to write a code to process this column in a little complicated way. These 81,000 observations are actually 9,000 sets (81,000/9). So, in each set whenever zero appears, rest all observations become zero. For example; If the column has: 111110111111011111111111111111111.... The output should look like: 111110000111000000111111111111111...
Let me see if I understand you. This was really
111110111
111011111
111111111
111111...
and you want
111110000
111000000
111111111
111111...
So let's treat it as a matrix (extending to 4 complete sets):
x <- as.numeric(strsplit("111110111111011111111111111111111011", NULL)[[1]])
xx <- matrix(x, ncol=9, byrow=TRUE)
Then a simple loop
for(i in 2:9) xx[,i] <- xx[,i] & xx[,i-1]
give me the second matrix, which I can read out as a vector as
as.vector(t(xx))
[1] 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0
or in what I understand as your format
paste(t(xx), collapse="")
[1] "111110000111000000111111111111111000"
Doing this with 81000 random 0/1's took a fraction of a second.
Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
In R, always begin to try to obtain result on a little unit.
Begin to make a function that will make replacements for ONE vector (of size 9)
FillWith=function(vec,SearchForOne=0,ReplaceNextValues=0)
{
pp=which(vec==SearchForOne)
if (length(pp)>0) vec[pp:length(vec)]=ReplaceNextValues
return(vec)
}
Verify it works:
> FillWith(c(1,1,0,1,1))
[1] 1 1 0 0 0
Then try to apply it with your data, using one of the ?apply functions.
Here, tapply seems to be adequate.
> data=c(rep(1,9),rep(1,4),0,rep(1,4))
> data
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1
> data=cbind(data,groups=((1:length(data)-1)%/%9))
> data
data groups
[1,] 1 0
[2,] 1 0
[3,] 1 0
[4,] 1 0
[5,] 1 0
[6,] 1 0
[7,] 1 0
[8,] 1 0
[9,] 1 0
[10,] 1 1
[11,] 1 1
[12,] 1 1
[13,] 1 1
[14,] 0 1
[15,] 1 1
[16,] 1 1
[17,] 1 1
[18,] 1 1
> tapply(data[,1],data[,2],FUN=FillWith)
$"0"
[1] 1 1 1 1 1 1 1 1 1
$"1"
[1] 1 1 1 1 0 0 0 0 0
And then come back to a vector with unlist().
Eric
At 08:27 24/12/2003, Pravin wrote:
Hello, I am a beginner in R programming and recently heard about this mailing list. Currently, I am trapped into a simple problem for which I just can't find a solution. I have a huge dataset (~81,000 observations) that has been analyzed and the final result is in the form of 0 and 1(one column). I need to write a code to process this column in a little complicated way. These 81,000 observations are actually 9,000 sets (81,000/9). So, in each set whenever zero appears, rest all observations become zero. For example; If the column has: 111110111111011111111111111111111.... The output should look like: 111110000111000000111111111111111... I hope this makes sense. Thank you in anticipation, Pravin Pravin Jadhav
-------------------------------------------------- L'erreur est certes humaine, mais un vrai d?sastre n?cessite un ou deux ordinateurs. Citation anonyme -------------------------------------------------- Eric Lecoutre Informaticien/Statisticien Institut de Statistique / UCL TEL (+32)(0)10473050 lecoutre at stat.ucl.ac.be URL http://www.stat.ucl.ac.be/ISpersonnel/lecoutre
Pravin a ?crit :
Hello, I am a beginner in R programming and recently heard about this mailing list. Currently, I am trapped into a simple problem for which I just can't find a solution. I have a huge dataset (~81,000 observations) that has been analyzed and the final result is in the form of 0 and 1(one column). I need to write a code to process this column in a little complicated way. These 81,000 observations are actually 9,000 sets (81,000/9). So, in each set whenever zero appears, rest all observations become zero. For example; If the column has: 111110111111011111111111111111111.... The output should look like: 111110000111000000111111111111111... I hope this makes sense. Thank you in anticipation, Pravin Pravin Jadhav [[alternative HTML version deleted]]
______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Pravin a ?crit : > Hello, > > > > I am a beginner in R programming and recently heard about this mailing list. > Currently, I am trapped into a simple problem for which I just can't find a > solution. I have a huge dataset (~81,000 observations) that has been > analyzed and the final result is in the form of 0 and 1(one column). > > > > I need to write a code to process this column in a little complicated way. > > These 81,000 observations are actually 9,000 sets (81,000/9). > > So, in each set whenever zero appears, rest all observations become zero. > > > > For example; > > If the column has: > > 111110111111011111111111111111111.... > > The output should look like: > > 111110000111000000111111111111111... > > > > I hope this makes sense. > > > > Thank you in anticipation, > > > > Pravin > > > > Pravin Jadhav > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > Here is an example: set.seed(101) v <- sample(c(0, 1), size = 36, replace = TRUE, prob = c(.05, .95)) L <- length(v) / 9 idx <- rep(seq(L), each = 9) fn <- function(x){ ok <- FALSE for(i in seq(length(x))){ if(x[i] == 0) ok <- TRUE x[i] <- if(ok) 0 else 1 } x } cbind(idx, v, recod = unlist(tapply(v, idx, fn))) idx v recod 11 1 1 1 12 1 1 1 13 1 1 1 14 1 1 1 15 1 1 1 16 1 1 1 17 1 1 1 18 1 1 1 19 1 1 1 21 2 1 1 22 2 1 1 23 2 1 1 24 2 1 1 25 2 1 1 26 2 1 1 27 2 1 1 28 2 1 1 29 2 1 1 31 3 1 1 32 3 1 1 33 3 1 1 34 3 0 0 35 3 1 0 36 3 1 0 37 3 1 0 38 3 1 0 39 3 1 0 41 4 1 1 42 4 1 1 43 4 1 1 44 4 1 1 45 4 1 1 46 4 1 1 47 4 1 1 48 4 1 1 49 4 1 1 > Merry Christmas ! Renaud -- Dr Renaud Lancelot v?t?rinaire ?pid?miologiste Ambassade de France - SCAC BP 834 Antannarivo 101 Madagascar t?l. +261 (0)32 04 824 55 (cell) +261 (0)20 22 494 37 (home)
Dr Renaud Lancelot
v?t?rinaire ?pid?miologiste
Ambassade de France - SCAC
BP 834 Antannarivo 101
Madagascar
t?l. +261 (0)32 04 824 55 (cell)
+261 (0)20 22 494 37 (home)