Note: For now, this is all based on the BHL version of the simulation with landscape_fitness_linked as the outcome variable. That can all be changed, though.


Constraint importance

Here we use a random forest to determine variable importance. We don’t need to include all the interactions (e.g. create_network + select + create_network * select, etc.) because random forests inherently pick those up.

According to Liz Dinsdale:

The mean decrease in Gini coefficient is a measure of how each variable contributes to the homogeneity of the nodes and leaves in the resulting random forest. Each time a particular variable is used to split a node, the Gini coefficient for the child nodes are calculated and compared to that of the original node. The Gini coefficient is a measure of homogeneity from 0 (homogeneous) to 1 (heterogeneous). The changes in Gini are summed for each variable and normalized at the end of the calculation. Variables that result in nodes with higher purity have a higher decrease in Gini coefficient.

Basically, the higher the decrease in impurity, the more important the variable in explaining the outcome.

constraint IncNodePurity
select 748.6
compete 453.9
catastrophe 33.86
create_network 16.69
disperse 2.953
selectfor_d 0.6772

Power of different combinations of constraints

THIS IS MAGICAL.

# Create columns for every combination of constraint in the data (e.g. select &
# disperse, select & disperse & create_network) 
#
# By the inimitable Vincent Arel-Bundock
make_combinations <- function(df, m = 5) {
  com <- colnames(df)[2:ncol(df)] %>%
    combn(m) %>%
    as_tibble()
  out <- com %>%
    map(~ df[.]) %>%
    map(~ rowSums(.) == ncol(.)) %>%
    setNames(map(com, paste, collapse = " + ")) %>%
    as_tibble()
  return(out)
}

# Select just the run number and *_constraint TRUE/FALSE columns
constraint_combinations <- sim_results %>%
  select(runnum, ends_with("_constraint")) %>% 
  # Shrink names by removing "_constraint"
  rename_at(vars(ends_with("constraint")), 
            list(~str_replace_all(., "_constraint", "")))

# Find all combinations of variables (m = number of items in combination; m = 2
# means pairs, m = 3 means triplets, etc.)
all_constraint_combos <- map(2:6, ~make_combinations(constraint_combinations, m = .)) %>% 
  bind_cols(constraint_combinations, .)

# Select the outcome variables we care about (for now just
# landscape_fitness_linked) and join the constraint combinations
constraint_combo_outcomes <- sim_results %>%
  select(runnum, n_constraints, landscape_fitness_linked) %>% 
  right_join(all_constraint_combos, by = "runnum")

# Don't double count rows. If a row has two constraints like select and
# disperse, it'll also have select + disperse set to TRUE. If that's the case,
# we don't want to include it in just select or just disperse
constraint_combo_outcomes_nested <- constraint_combo_outcomes %>% 
  select(-n_constraints) %>% 
  # Make long
  gather(constraint_combo, value, -c(runnum, landscape_fitness_linked)) %>% 
  # Count how many constraints there are within each row based on + signs
  mutate(n = str_count(constraint_combo, "\\+") + 1) %>%
  # Only keep rows where the constraint is turned on
  filter(value == TRUE) %>%
  # Nest all the constraint combinations within each row
  group_by(runnum) %>% 
  nest()

# Only keep the values where n == max(n) for that row
constraint_combo_outcomes_filtered <- constraint_combo_outcomes_nested %>% 
  mutate(filtered = data %>% map(~filter(., n == max(.$n)))) %>% 
  select(-data) %>% 
  unnest(filtered)

# This omitted all the rows where n_constraints == 0, so add those back in
no_constraints <- constraint_combo_outcomes %>% 
  filter(n_constraints == 0) %>% 
  mutate(constraint_combo = "No constraints", n = 0) %>% 
  select(runnum, landscape_fitness_linked, constraint_combo, n)

constraint_combo_outcomes_done <- bind_rows(constraint_combo_outcomes_filtered,
                                            no_constraints) %>% 
  select(-value)

