Skip to content

3 questions regarding matrix copy/shuffle/compares

12 messages · Esmail, David Winsemius, Hadley Wickham

#
Hello all,

I have the following function call to create a matrix of POP_SIZE rows
and fill it with bit strings of size LEN:

    pop=create_pop_2(POP_SIZE, LEN)

I have 3 questions:

(1) If I did

     keep_pop[1:POP_SIZE] == pop[1:POP_SIZE]

     to keep a copy of the original data structure before manipulating
     'pop' potentially, would this make a deep copy or just shallow? Ie
     if I change something in 'pop' would it be reflected in 'keep_pop'
     too? (I don't think so, but just wanted to check). I would like
     two independent copies.

(2) If I wanted to change the order of *rows* in my matrix 'pop', is there
     an easy way to shuffle these? I don't want to change anything in the
     columns, just the complete rowsn (E.g., in Python I could just say
     something like suffle(pop) assuming pop is a list of list) - is there
     an equivalent for R?

(3) I would like to compare the contents of 'keep_pop' with 'pop'. Though
     the order of rows may be different it should not matter as long as
     the same rows are present. Again, in Python this would be simply

     if sorted(keep_pop) == sorted(pop):
        print 'they are equal'
     else
        print 'they are not equal'

     Is there an equivalent R code segment?

Thanks,

Esmail

--------------- the code called above -------------


####################################################
# create a binary vector of size "len"
#
create_bin_Chromosome <- function(len)
{
   sample(0:1, len, replace=T)
}



############## create_population ###################
# create population of chromosomes of length len
# the matrix contains twice as much space as popsize
#
create_pop_2 <- function(popsize, len)
{
   datasize=len*popsize
   print(datasize)
   npop <- matrix(0, popsize*2, len, byrow=T)

   for(i in 1:popsize)
     npop[i,] = create_bin_Chromosome(len)

   npop
}
#
On Apr 26, 2009, at 12:28 AM, Esmail wrote:

            
Are you construction a vector or a matrix? What are the dimensions of  
your matrix?
"==" is not an assignment operator in R, so the answer is that it  
would do neither.
"<-" and "=" can do assignment. In neither case would it be a "deep  
copy".
You can get a value from a matrix by using the indexing construction.  
But your
terminology is confusing. Is pop a matrix or a list?

?"["
?order
... and perhaps ?sample if you wanted a random permutation of the rows.

I am going to refrain from posting speculation until you provide valid  
R code
that will create an object that can be the subject of operations.
Depends on what you want to do and what you are doing it on. You could  
look at:

?%in%
?merge
The code below creates a "bit vector" but then only makes exact  
multiples of it
in the first row and zeros in the second row. Was that what was desired?
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
Hello David,

Let me try again, I don't think this was the best post ever I've made :-)
Hopefully this is clearer, or otherwise I may break this up into
three separate simple queries as this may be too long.


 > "==" is not an assignment operator in R, so the answer is that it
 > would do neither.  "<-" and "=" can do assignment. In neither case
 > would it be a "deep copy".

It was late when I posted the code, I made a mistake with regard to
the assignment operator and used the boolean compare instead -- thanks
for catching that.

It should have been:

     keep_pop[1:POP_SIZE] = pop[1:POP_SIZE]


-------- Here's an edited and clearer version I hope:


The basic idea is that I am trying to keep track of a number of bitrings.

Therefore I am creating a matrix (named 'pop') whose rows are made up
of bit vectors (ie my bitstrings).  I only initialize half of the rows
with my bitstrings of random 1s and 0s, the rest of the rows are set
to all zeros).

So I use following function call to create a matrix and fill it with
bit strings:

    pop=create_pop_2(POP_SIZE, LEN)

where

    POP_SIZE refers to the number of rows
    LEN to the columns (length of my bitstrings)



This is the code I call:

####################################################
# create a random binary vector of size "len"
#
create_bin_Chromosome <- function(len)
{
   sample(0:1, len, replace=T)
}



############## create_population ###################
# create population of chromosomes of length len
# the matrix contains twice as much space as popsize
#
create_pop_2 <- function(popsize, len)
{
   datasize=len*popsize
   print(datasize)
   npop <- matrix(0, popsize*2, len, byrow=T)

   for(i in 1:popsize)
     npop[i,] = create_bin_Chromosome(len)

   npop
}


My 3 questions:

(1) If I did

     keep_pop[1:POP_SIZE] = pop[1:POP_SIZE]

     to keep a copy of the original data structure before manipulating
     'pop' potentially, would this make a deep copy or just shallow? Ie
     if I change something in pop would keep_pop change too? I would
     like two independent copies so that 'keep_pop' stays intact while
     'pop' may change.

     > "<-" and "=" can do assignment. In neither case would it be a
     > "deep copy".

     Is there a deepcopy operator, or would I have to have two nested
     loops and iterate through them? Or is there a nice R-idiomatic way
     to do this?


(2) If I wanted to change the order of rows in my matrix 'pop', is
     there an easy way to shuffle these?  I.e., I don't want to change
     any of the bitstrings vectors/rows, just the order of the rows in the
     matrix 'pop'. (E.g., in Python I could just say something like
     suffle(pop)) - is there an equivalent for R?

     So if pop [ [0, 0, 0]
                 [1, 1, 1]
                 [1, 1, 0] ]

     after the shuffle it may look like

               [ [1, 1, 0]    (originally at index 2)
	        [1, 1, 1]    (originally at index 1)
                 [0, 0, 0] ]  (originally at index 0)

     the rows themselves remained intact, just their order changes.
     This is a tiny example, in my case I may have 100 rows (POPS_SIZE)
     and rows of LEN 200.


(3) I would like to compare the contents of 'keep_pop' (a copy of the
     original 'pop') with the current 'pop'. Though the order of rows
     may be different between the two, it should not matter as long as
     the same rows are present.  So for the example given above, the
     comparison should return True.

     For instance, in Python this would be simply

     if sorted(keep_pop) == sorted(pop):
        print 'they are equal'
     else
        print 'they are not equal'

     Is there an equivalent R code segment?


I hope this post is clearer than my original one. Thank you David for
pointing out some of the shortcomings of my earlier post.

Thanks,

Esmail
#
On Apr 26, 2009, at 7:48 AM, Esmail wrote:

            
Not that I know of, although my knowledge of R depth is not  
encyclopedic. You might get the desired sort of effect by creating a  
copy  inside a function, working on it inside the function in the  
manner desired, and then comparing the output to the original. There  
might be other strategies to get certain effects by creating specific  
environments.
Yes. As I said before "I am going to refrain from posting speculation  
until you provide valid R code
that will create an object that can be the subject of operations."
If you created a random index vector that was used to sort the rows  
for display or computational purposes only, you could maintain the  
original ordering so that row wise comparisons could be done.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
David Winsemius wrote:
The code I have provided works, here is a run that may prove helpful:

POP_SIZE = 6
LEN = 8

pop=create_pop_2(POP_SIZE, LEN)

print(pop)
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
  [1,]    0    1    0    1    1    0    0    1
  [2,]    0    0    0    0    0    0    0    0
  [3,]    1    1    0    0    1    0    0    0
  [4,]    0    0    0    0    0    0    0    1
  [5,]    0    0    1    1    0    0    1    0
  [6,]    1    0    0    0    0    0    1    0
  [7,]    0    0    0    0    0    0    0    0
  [8,]    0    0    0    0    0    0    0    0
  [9,]    0    0    0    0    0    0    0    0
[10,]    0    0    0    0    0    0    0    0
[11,]    0    0    0    0    0    0    0    0
[12,]    0    0    0    0    0    0    0    0

I want to (1) create a deep copy of pop, (2) be able to shuffle
the rows only, and (3) be able to compare two copies of these objects
for equality and have it return True if only the rows have been shuffled.
#
On Apr 26, 2009, at 9:43 AM, Esmail wrote:

            
I have already said *I* do not know how to create a "deep copy" in R.
I have suggested that shuffling by way of a random selection of an   
external index:

 > pop=create_pop_2(POP_SIZE, LEN)
[1] 48
 > pop
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
  [1,]    1    1    0    0    1    0    1    1
  [2,]    1    0    1    0    0    0    1    0
  [3,]    1    1    0    1    0    1    0    0
  [4,]    0    0    0    0    1    0    0    0
  [5,]    1    0    0    1    1    1    1    1
  [6,]    1    1    0    0    0    0    0    0
  [7,]    0    0    0    0    0    0    0    0
  [8,]    0    0    0    0    0    0    0    0
  [9,]    0    0    0    0    0    0    0    0
[10,]    0    0    0    0    0    0    0    0
[11,]    0    0    0    0    0    0    0    0
[12,]    0    0    0    0    0    0    0    0

 > dx <- sample(1:nrow(pop), nrow(pop) )
 > dx
  [1] 12 10  8  9  3  1  6 11  5  7  4  2
 > pop[dx,]
       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
  [1,]    0    0    0    0    0    0    0    0
  [2,]    0    0    0    0    0    0    0    0
  [3,]    0    0    0    0    0    0    0    0
  [4,]    0    0    0    0    0    0    0    0
  [5,]    1    1    0    1    0    1    0    0
  [6,]    1    1    0    0    1    0    1    1
  [7,]    1    1    0    0    0    0    0    0
  [8,]    0    0    0    0    0    0    0    0
  [9,]    1    0    0    1    1    1    1    1
[10,]    0    0    0    0    0    0    0    0
[11,]    0    0    0    0    1    0    0    0
[12,]    1    0    1    0    0    0    1    0
I see two possible questions, the first easier (for me) than the  
second. Do you want to work on a copy with a known permutation of  
rows... or on a copy with an unknown ordering? In the first case I am  
unclear why you would not create an original and a copy, work on the  
copy, and compare with the original that is also sorted by the  
external index.
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
#
Creating a deep copy is easy, because all copies are "deep" copies.
You need to try very hard to create a reference in R.

Hadley
#
My understanding of the OP's request was for some sort of copy which  
did change when entries in the original were changed; the sort of  
behavior that might be seen  in a spreadsheet that had a copy "by  
reference".
On Apr 26, 2009, at 11:28 AM, hadley wickham wrote:

            
#
David,

Good news! It seems that R has deep copy by default. I ran this simplified
test and it seems I can change 'pop' without changing the saved version.

POP_SIZE = 4
LEN = 8
pop=create_pop_2(POP_SIZE, LEN)
cat('printing original pop\n')
print(pop)

keep_pop = pop
pop[1,1] = 99

cat('printing changed pop\n')
print(pop)
cat('printing keep_pop\n')
print(keep_pop)



-----------

 > source('mat.R')
[1] 32
printing original pop
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    1    1    0    1    0    0    1
[2,]    1    0    1    0    0    0    1    1
[3,]    0    1    0    1    1    1    0    1
[4,]    0    0    0    1    0    1    0    0
[5,]    0    0    0    0    0    0    0    0
[6,]    0    0    0    0    0    0    0    0
[7,]    0    0    0    0    0    0    0    0
[8,]    0    0    0    0    0    0    0    0


printing changed pop
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]   99    1    1    0    1    0    0    1
[2,]    1    0    1    0    0    0    1    1
[3,]    0    1    0    1    1    1    0    1
[4,]    0    0    0    1    0    1    0    0
[5,]    0    0    0    0    0    0    0    0
[6,]    0    0    0    0    0    0    0    0
[7,]    0    0    0    0    0    0    0    0
[8,]    0    0    0    0    0    0    0    0


printing keep_pop
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0    1    1    0    1    0    0    1
[2,]    1    0    1    0    0    0    1    1
[3,]    0    1    0    1    1    1    0    1
[4,]    0    0    0    1    0    1    0    0
[5,]    0    0    0    0    0    0    0    0
[6,]    0    0    0    0    0    0    0    0
[7,]    0    0    0    0    0    0    0    0
[8,]    0    0    0    0    0    0    0    0


Re Shuffle

I tried using sample based on your earlier post, but your example
really helped, thanks!  That solves the shuffling issue.

dx <- sample(1:POP_SIZE, POP_SIZE)
cat('shuffled index:')
print(dx)
print(pop[dx,])

cat('shuffled pop')
pop[1:POP_SIZE,] = pop[dx,]
print(pop)


re compare:

 > I am unclear why you would not create an original and a copy,

Well .. that I wanted to do from the start (hence my question about
deep copy :-)

 > work on the copy, and compare with the original that is also sorted
 > by the external index.

That's a great idea, hadn't thought of keeping the index around for
this, I'll give this a try.

Final question, how do I compare these two structures so that I get
one result, true or false? Right now

keep == pop yields all these individual comparisons:

 > pop==keep

       [,1] [,2]  [,3] [,4]  [,5]
[1,] FALSE TRUE FALSE TRUE FALSE
[2,] FALSE TRUE FALSE TRUE FALSE
[3,]  TRUE TRUE  TRUE TRUE  TRUE
[4,]  TRUE TRUE  TRUE TRUE  TRUE
[5,]  TRUE TRUE  TRUE TRUE  TRUE
[6,]  TRUE TRUE  TRUE TRUE  TRUE

Thanks for the help, much appreciated.

Esmail
#
In that case, you would want a shallow copy, and you'd need to jump
through a lot of hoops to do that in R.

Hadley

On Sun, Apr 26, 2009 at 10:35 AM, David Winsemius
<dwinsemius at comcast.net> wrote:

  
    
#
hadley wickham wrote:
Hi Hadley

Right you are .. I discovered this now too. It's really confusing to
go back and forth between different languages. I have been programming
in Python for the last 2 months and everything there is a reference .. so
I have to worry about deep copy etc.

Thanks!
Esmail
#
David Winsemius
You misunderstood (my phrasing wasn't probably the best), but I was
clear about wanting two independent copies.

 From my earlier post:

(1) If I did

     keep_pop[1:POP_SIZE] = pop[1:POP_SIZE]

     to keep a copy of the original data structure before manipulating
     'pop' potentially, would this make a deep copy or just shallow? Ie
     if I change something in 'pop' would it be reflected in 'keep_pop'
     too? (I don't think so, but just wanted to check). I would like
     two independent copies.

Regardless, the net outcome was new knowledge, so this is a good outcome.

Esmail