Skip to content

R code for overlapping variables -- count

4 messages · Leonard Mada, Rui Barradas

#
Dear Shadee,

If you have a data.frame with the following columns:

n = 100; # population size
x = data.frame(
??????Sex = sample(c("M","F"), n, T),
??????Country = sample(c("AA", "BB", "US"), n, T),
??????Income  = as.factor(sample(1:3, n, T))
)

# Dummy variable
ONE = rep(1, nrow(x))

r = aggregate(ONE ~ Sex + Income + Country, length, data = x)
r = r[, c("Country", "Income", "Sex")]
print(r)

It is possible to write more simple code, if you need only the particular combination of variables (which you specified in your mail). But this is the more general approach.

Note: you may want to use "sum" instead of "length", e.g. if you have a column specifying the number of individuals in that category.


Hope this helps,

Leonard
#
?s 18:34 de 02/06/2024, Leo Mada via R-help escreveu:
Hello,

The following is simpler.


r2 <- xtabs(~ ., x) |> as.data.frame()
r2[-4L] # or r2[names(r2) != "Freq"]


Hope this helps,

Rui Barradas
#
Correcting a small glitch - see new code.
#
?s 18:40 de 02/06/2024, Rui Barradas escreveu:
Hello,

This is the same solution but the code to keep only the columns in the 
original data set is better. And it's a MRE.


n <- 100; # population size
x <- data.frame(
   Sex = sample(c("M","F"), n, T),
   Country = sample(c("AA", "BB", "US"), n, T),
   Income  = as.factor(sample(1:3, n, T))
)

r2 <- xtabs(~ ., x) |> as.data.frame()
# no need for constants, find the columns
# to keep from the data
r2[names(r2) %in% names(x)]


Hope this helps,

Rui Barradas