Raw index scores

Standardized scores

Or as a violin chart:

Check equality of distributions

psm_indexes <- psm %>%
  select(ID, starts_with("index")) 

# Null = distributions are the same
# If p is small, groups came from populations with different distributions
# https://www.graphpad.com/guides/prism/7/statistics/index.htm?interpreting_results_kolmogorov-smirnov_test.htm
ks_tests <- tribble(
  ~var1, ~var2, ~results,
  "Perry", "International", ks.test(psm_indexes$index_perry_z, psm_indexes$index_intl_z),
  "Perry", "Grant", ks.test(psm_indexes$index_perry_z, psm_indexes$index_grant_z),
  "Perry", "MSP85", ks.test(psm_indexes$index_perry_z, psm_indexes$index_msp_z),
  "MSP85", "International", ks.test(psm_indexes$index_msp_z, psm_indexes$index_intl_z),
  "MSP85", "Grant", ks.test(psm_indexes$index_msp_z, psm_indexes$index_grant_z),
  "Grant", "International", ks.test(psm_indexes$index_grant_z, psm_indexes$index_intl_z)
) %>%
  mutate(bloop = results %>% map(tidy)) %>%
  unnest(bloop)

ks_blanks <- data_frame(var1 = c("Perry", "MSP85", "Grant", "International")) %>%
  mutate(var2 = var1,
         statistic = 0)

star.labs <- c("***", "**", "*", "")
star.nums <- c("p < 0.001", "p < 0.01", "p < 0.05", "p > 0.05")

ks_long <- bind_rows(ks_tests, ks_blanks) %>%
  mutate_at(vars(var1, var2), funs(factor(., levels = ks_blanks$var1, ordered = TRUE))) %>%
  mutate(stars = as.character(symnum(p.value, 
                                     cutpoints = c(0, 0.001, 0.01, 0.05, 1),
                                     symbols = star.labs)),
         stars = ifelse(stars == "?", NA, stars),
         stars = factor(stars, levels = star.labs, ordered = TRUE),
         label = ifelse(!is.na(stars), paste(round(statistic, 2), stars), ""))

ggplot(ks_long, aes(x = fct_rev(var2), y = fct_rev(var1), fill = stars)) +
  geom_tile() +
  geom_text(aes(label = label),
            family = "Roboto Condensed", fontface = "plain") +
  scale_fill_manual(values = rev(c("#feedde", "#fdbe85", "#fd8d3c", "#d94701")),
                    breaks = star.labs, labels = star.nums, name = NULL,
                    drop = FALSE, na.value = "grey95") +
  labs(x = NULL, y = NULL, title = "Kolmogorov-Smirnov statistics",
       subtitle = "Pairwise comparison between standardized distributions") +
  coord_equal() +
  theme_psm() +
  theme(panel.grid.major = element_blank(),
        legend.position = "bottom")

Grant is different from everything else; nothing else is different from each other.

Check divergence of distributions

Instead of hypothesis testing, we can see how much entropic divergence there is between different distributions. Kullback-Leibler (KL) divergence is particularly useful for comparing two probability distributions. This blog post is really helpful for getting the intuition behind KL divergence, and this post and this video (starting at 47:52) explain why it’s asymmetric and what that actually means. (Though good luck interpreting it in any non-information theoretic way).

Calculating the KL divergence statistic is surprisingly convoluted in R. There are a ton of different packages that do it (entropy, philentropy, flexmix, FNN, and laplacesdemons, to name a few), and they all have different syntaxes. Additionally, some are designed to deal with discrete random variables, not continuous random variables, so they require discretized vectors to work (like entropy: you have to run KL.empirical(discretize(x1, numBins = N), discretize(x2, numBins = N)), and numBins has a huge effect on the divergence if the vectors aren’t huge).

Fortunately, flexmix::KLdiv() works well with continuous random variables (it probably does some magic discretization behind the scenes?), and it will output a matrix of all pairwise comparisons if you feed it a matrix with multiple columns.

In general, the smaller the number, the less divergence there is. There’s no magic critical value—just that values closer to zero mean the distributions are more similar.

  perry msp grant intl
perry 0 0.136 1.176 0.13
msp 0.133 0 0.626 0.014
grant 1.478 0.812 0 0.763
intl 0.144 0.015 0.608 0
