Skip to content

Extracting portions of one vector to create others

8 messages · Paul Zachos, Jeffrey Dick, Eric Berger +4 more

#
Dear Colleagues,

I have a vector which indicates membership of subjects in one of 5 Classes

Beth$CLASS
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[37] 2 2 2 2 2 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7 7 7 7 7 7 7 7 7 7 7
[73] 7 7 7 7 7 7 7 7 7 9 9 9 9 9 9 9 9 9 9 9 9 9 9 1

For purposes of an analysis (using linear models based on Ward and Jennings) I would like to create 5 new vectors

The values in vector CLASS1 will be ?1? if the corresponding value in Beth$CLASS is equal to ?1?; ?0? otherwise

The values in vector CLASS2 will be ?1? if the corresponding value in Beth$CLASS is equal to ?2?; ?0? otherwise

The values in vector CLASS4 will be ?1? if the corresponding value in Beth$CLASS is equal to ?4?; ?0? otherwise

The values in vector CLASS7 will be ?1? if the corresponding value in Beth$CLASS is equal to ?7?; ?0? otherwise

The values in vector CLASS9 will be ?1? if the corresponding value in Beth$CLASS is equal to ?9?; ?0? otherwise

How would I go about this using R

Thank you
_________________
Paul Zachos, PhD
Director, Research and Evaluation
Association for the Cooperative Advancement of Science and Education (ACASE)
110 Spring Street  Saratoga Springs, NY 12866  |  
paz at acase.org  |  www.acase.org
#
Hi Paul,

Here's a way that creates new vectors as elements of a list.

Beth <- list(
 CLASS = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
   2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,
   7, 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1
 )
)

classes <- unique(Beth$CLASS)

# Loop over `classes` to operate on a single `class` in each iteration
result <- lapply(classes, function(class) {
  # Create a vector with 1 if Beth$CLASS is in class, 0 if not
  as.numeric(Beth$CLASS == class)
})

names(result) <- paste0("CLASS", classes)

with the result:
$CLASS1
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0
[83] 0 0 0 0 0 0 0 0 0 0 0 0 0 1

$CLASS2
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0
[83] 0 0 0 0 0 0 0 0 0 0 0 0 0 0

[list truncated ...]

Regards,
Jeff
On Tue, Sep 2, 2025 at 2:44?PM Paul Zachos <paz at acase.org> wrote:
#
On 9/1/2025 3:09 PM, Paul Zachos wrote:
Hello,

Here is a way. This creates a matrix with the vectors you ask for.
But it doesn't create 5 different vectors, it keeps them in one object 
only, a matrix.



CLASS <-
   c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
     1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
     2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L,
     4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 7L, 7L,
     7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L,
     7L, 7L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 1L)
Beth <- data.frame(CLASS)

eq <- c(1, 2, 4, 7, 9)
res <- sapply(eq, \(x) as.integer(x == Beth$CLASS))
colnames(res) <- paste0("CLASS", eq)
head(res)
#>      CLASS1 CLASS2 CLASS4 CLASS7 CLASS9
#> [1,]      1      0      0      0      0
#> [2,]      1      0      0      0      0
#> [3,]      1      0      0      0      0
#> [4,]      1      0      0      0      0
#> [5,]      1      0      0      0      0
#> [6,]      1      0      0      0      0


If you really want 5 different objects in the global environment, you 
can use ?list2env.
This is not a good practice, you will have related, loose objects in the 
globalenv, making your code harder to debug. Keep it simple.


as.data.frame(res) |>
   list2env(envir = .GlobalEnv)
#> <environment: R_GlobalEnv>

CLASS1
#>  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0
#> [39] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0
#> [77] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
CLASS2
#>  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1
#> [39] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0
#> [77] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


Hope this helps,

Rui Barradas
#
Your request is equivalent to "how to do one-hot encoding in R". You can do
this search yourself.

Just for fun, I did that search and the following is a short solution:

A <- model.matrix(  ~ as.character(Beth$CLASS) - 1 )
colnames(A) <- paste0("CLASS", c(1,2,4,7,9))

A is now a matrix whose columns are the vectors that you want. To see this:

head(A)

  CLASS1 CLASS2 CLASS4 CLASS7 CLASS9
1           1             0              0             0           0
2           1             0              0             0           0
3           1             0              0             0           0
4           1             0              0             0           0
5           1             0              0             0           0
6           1             0              0             0           0
On Tue, Sep 2, 2025 at 10:07?AM Jeffrey Dick <j3ffdick at gmail.com> wrote:

            

  
  
#
On 9/2/2025 8:38 AM, Rui Barradas wrote:
Hello,

Here is another way with ?model.matrix.


res2 <- model.matrix(~ 0 + factor(CLASS), data = Beth)
colnames(res2) <- sub("factor\\((.*)\\)", "\\1", colnames(res2))



This will create matrix res2 with other attributes, not just dimnames. 
To get rid of those you can run


attr(res2, "assign") <- NULL
attr(res2, "contrasts") <- NULL
attributes(res2)


Hope this helps,

Rui Barradas
#
This is an extremely common transformation in regression, and it has a dedicated function (model.matrix) for accomplishing it in R. It does presume you have a bit of familiarity with the formula literal type in R... which is created using the tilde character (e.g. ~ CLASS). Also, since including an intercept term in a regression is the default, if you only want the columns you described then you need to inform the function that you want to leave the intercept out (~ CLASS - 1). It doesn't apply this discrete transformation unless the referenced variable is a factor variable, so I convert it from integer type in the short example below.

Beth <- data.frame(

??CLASS = c(
????1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2
????, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4
????, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
????, 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1))
Beth$CLASS <- as.character(Beth$CLASS)
result <- model.matrix( ~ CLASS - 1, Beth)
result

Note that the result is a matrix, not a data frame so you will have to use brackets and a blank for the row spec if you want to access one column at a time: 

result[, "CLASS1"]

Refer to the help page for more on this function:
?model.matrix
On September 1, 2025 7:09:56 AM PDT, Paul Zachos <paz at acase.org> wrote:

  
    
#
The solutions offered are fine but perhaps not quite directly what was asked for.

A careful reading seems to suggest they simply want a way to index the data.frame multiple ways by creating a set of vectors containing zero or one. He seems to want a solution for a specific case. This seems like it can be solved fairly easily with ifelse() as in:

CLASS1 <- ifelse(Beth$CLASS==1, 1, 0)
CLASS2 <- ifelse(Beth$CLASS==2, 1, 0)
CLASS4 <- ifelse(Beth$CLASS==4, 1, 0)
CLASS7 <- ifelse(Beth$CLASS==7, 1, 0)
CLASS9 <- ifelse(Beth$CLASS==9, 1, 0)

Of course a more general solution could consider arbitrary factors and create either custom names for resulting vectors or create a list of such vectors or other data structures like the matrix one being provided.

And, of course, sometimes you do not choose this way of having a "Boolean vector" to be used as an index but can use standard R (or packages like dplyr) to select the rows needed with a query.

-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of Jeff Newmiller via R-help
Sent: Tuesday, September 2, 2025 4:16 AM
To: r-help at r-project.org
Subject: Re: [R] Extracting portions of one vector to create others

This is an extremely common transformation in regression, and it has a dedicated function (model.matrix) for accomplishing it in R. It does presume you have a bit of familiarity with the formula literal type in R... which is created using the tilde character (e.g. ~ CLASS). Also, since including an intercept term in a regression is the default, if you only want the columns you described then you need to inform the function that you want to leave the intercept out (~ CLASS - 1). It doesn't apply this discrete transformation unless the referenced variable is a factor variable, so I convert it from integer type in the short example below.

Beth <- data.frame(

  CLASS = c(
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2
    , 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4
    , 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7
    , 7, 7, 7, 7, 7, 7, 7, 7, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1))
Beth$CLASS <- as.character(Beth$CLASS)
result <- model.matrix( ~ CLASS - 1, Beth)
result

Note that the result is a matrix, not a data frame so you will have to use brackets and a blank for the row spec if you want to access one column at a time: 

result[, "CLASS1"]

Refer to the help page for more on this function:
?model.matrix
On September 1, 2025 7:09:56 AM PDT, Paul Zachos <paz at acase.org> wrote:

  
    
#
To do exactly what you asked for,
CLASS1 <- rep('0', len=length(Beth$CLASS))
because there is NO element in Beth$CLASS that is '1'.
Assuming that you did not mean ANY of the single quotes to be taken seriously,
CLASS1 <- as.integer(Beth$CLASS == 1)
Beth$CLASS == 1 will give you TRUE where Beth$CLASS has 1, FALSE elsewhere.
as.integer(Beth$CLASS == 1) will convert TRUE to 1 and FALSE to 0.
If you really want those 1s and 0x to be '1's and '0's,
as.character(as.integer(Beth$CLASS == 1))
will do the trick.  What you might find to be clearer is
CLASS2 <- ifelse(Beth$CLASS == 2, '1', '0')

For linear modelling, you are probably interested in factors rather
than strings.
To avoid confusion, you might want to avoid using '0' and '1'
On Tue, 2 Sept 2025 at 18:44, Paul Zachos <paz at acase.org> wrote: