Sent from my phone. Please excuse my brevity.
On March 8, 2017 7:27:08 AM PST, G.Maubach at weinwolf.de wrote:
>Hi All,
>
>today I have a more general question concerning the approach of storing
>
>different values from the analysis of multiple variables.
>
>My task is to compare distributions in a universe with distributions
>from
>the respondents using a whole bunch of variables. Comparison shall be
>done
>on relative frequencies (proportions).
>
>I was thinking about the structure I should store the results in and
>came
>up with the following:
>
>-- cut --
>
>library(stringi)
>
># Result data frame
># Some sort of tidytidy data set where
># each value is stored as an identity.
># This way all values for all variables could be stored in
># one unique data structure.
># If an additional variable added for the name of the
># research one could also build result data set across
># surveys.
># Values for measure could be "number" for 'raw' values or
># "freq" for frequencies/counts.
># Values for unit could be "n" for 'numbers' and
># "%" for percentages.
>d_test <- data.frame(
> group = rep(c("Universe", "Respondents"), each = 16),
> variable = rep("State", 32),
> value = rep(c(11.3,
> 12.7,
> 3.3,
> 5,
> 0.6,
> 8.1,
> 6.2,
> 5.8,
> 6.4,
> 14.5,
> 8.3,
> 0.3,
> 3.8,
> 2.5,
> 8.1,
> 3), 2),
> label = rep(c("Baden-Wuerttemberg",
> "Bayern",
> "Berlin",
> "Brandenburg",
> "Bremen",
> "Hamburg",
> "Hessen",
> "Mecklenburg-Vorpommern",
> "Niedersachsen",
> "Nordrhein-Westfalen",
> "Rheinland-Pfalz",
> "Saarland",
> "Sachsen",
> "Sachsen-Anhalt",
> "Schleswig-Holstein",
> "Thueringen"),2),
> measure = rep("freq", 32),
> unit = rep("%", 32),
> stringsAsFactors = FALSE
>)
>
># This way the variables can be selected using simple
># value selection from Base R functionality.
>data <- d_test[d_test$variable == "State" ,]
>
># And plot results for every variable.
>ggplot(
> data = data,
> aes(
> x = label,
> y = value,
> fill = group)) +
> geom_bar(stat = "identity", position = "dodge") +
> theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
>scale_fill_discrete(name = stringi::stri_trans_totitle(names(data)[1]))
>
>+
> scale_x_discrete(name = data$variable[1]) +
> scale_y_discrete(name = data$unit[1])
>
>-- cut --
>
>The reporting / presentation is done in R Markdown. I would load the
>result data set once at the beginning and running the comparisons as
>plots
>on each variable named in the results data set under "variable".
>
>If I follow this approach for my customer relationship survey, do think
>I
>would face drawbacks or run into serious trouble?
>
>I am interested in your opinion and open for other approaches and
>suggestions.
>
>Kind regards
>
>Georg
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.