This analysis estimates the effect of information richness* by looking at how much more money is allocated to a scenario on average, when that scenario is conveyed using an information-rich (as opposed to an information-poor) visualization. This analysis has been pre-registered at […] (also see the plan/ directory).

(*) the effect estimated also includes the use of anthropomorphic marks, but we will only refer to information richness for simplicity.

1 Setup

rm(list = ls())
library(tidyverse)

# Load custom helper functions
source("helpers.R")
source("CI-helpers.R")

# Completion code used to determine valid submissions
completion_code = "64CABB60"

# Participant IDs used for preview and testing (those will be discarded)
exclude_participant_ids = c("5d8cca80897af7001a72156a", "test")

2 Loading data

We read and combine four data files:

prolific_export.csv is created by Prolific and contains the job status, the completion code entered and basic demographics for all participants who accepted the job.
results.csv is created by our web server and contains the responses from all participants who completed the job.
excluded.csv is created by our web server and lists all participants who either failed the attention check or attempted to reload the page. It does not list participants who dropped out for other reasons.
all_who_agreed.csv is created by our web server and lists all participants who agreed to the consent form, no matter what they did next. We will only consider those participants in our analysis of attrition rates, because many participants who drop before may experience irrelevant issues such as browser compatibility problems, and because the two conditions are identical before the consent form anyway.

# Load the four data files and only keep the useful columns
prolific_export = read_csv("data/prolific_export.csv") %>%
  select(participant_id,
         entered_code,
         time_taken,
         age,
         country = `Current Country of Residence`,
         sex = Sex)

results = read_csv("data/results.csv") %>%
  select(participant_id,
         fund_allocation_southeast_asia = fundAllocation_UNHCR,
         fund_allocation_middle_east = fundAllocation_IOM,
         donation_allocation_southeast_asia = donationAllocation) %>%
  mutate(fund_allocation_difference = fund_allocation_southeast_asia - fund_allocation_middle_east)

excluded = read_csv("data/excluded.csv") %>%
  select(participant_id,
         failed_attention_check,
         reloaded) %>%
  mutate(participant_id = as.character(participant_id)) # necessary if all are NAs 

agreed = read_csv("data/all_who_agreed.csv") %>%
  select(participant_id,
         condition)

# Join the four datasets and clean up a bit
data_all = agreed %>%
  left_join(excluded, by = "participant_id") %>%
  left_join(prolific_export, by = "participant_id") %>%
  full_join(results, by = "participant_id") %>%
  subset(!participant_id %in% exclude_participant_ids) %>% 
  mutate(failed_attention_check = coalesce(failed_attention_check, 0), # replace NAs with zeros
         reloaded = coalesce(reloaded, 0))

# Clean up possibly serious anomalies in the data and issue warnings if necessary
data_all_original = data_all
data_all = data_all %>%
  drop_na(participant_id, condition) %>%
  distinct(participant_id, .keep_all= TRUE)
if (nrow(data_all) != nrow(data_all_original)) {
  cat("WARNING -- Data has duplicate participants, missing participant IDs, or missing conditions:")
  anti_join(data_all_original, data_all)
}

# Create another dataset with completed submissions only
data = data_all %>% filter(entered_code == completion_code)

# Split datasets according to the condition
data_all_rich_first = data_all %>% filter(condition == "richFirst")
data_all_poor_first = data_all %>% filter(condition == "poorFirst")
data_rich_first = data %>% filter(condition == "richFirst")
data_poor_first = data %>% filter(condition == "poorFirst")

## WARNING -- Data has duplicate participants, missing participant IDs, or missing conditions:

3 Attrition analysis

As a sanity check, we start by looking at attrition to make sure participants don’t drop out way more often in one condition than the other. See Zhou, H. and Fishbach, A. (2016), The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions. Journal of personality and social psychology 111.4 (2016): 493.

# Count participants who agreed to the consent form
Nall = nrow(data_all)
Nall_r = nrow(data_all_rich_first)
Nall_p = nrow(data_all_poor_first)

# Count participants who completed the job
N = nrow(data)
N_r = nrow(data_rich_first)
N_p = nrow(data_poor_first)

cat("Number of participants who agreed to the consent form:", Nall, "\n")
cat("Number of valid submissions:                          ", N, "\n")

## Number of participants who agreed to the consent form: 144 
## Number of valid submissions:                           128

N_dropped = Nall - N
N_dropped_r = Nall_r - N_r
N_dropped_p = Nall_p - N_p
diff_ci = 100 * diffpropCI(N_dropped_r, Nall_r, N_dropped_p, Nall_p)

cat("Attrition rate: ", round(N_dropped / Nall * 100), "%\n", sep="")
cat("  Rich first:  ", round(N_dropped_r / Nall_r * 100), "%\n", sep="")
cat("  Poor first:  ", round(N_dropped_p / Nall_p * 100), "%\n", sep="")
cat("  Difference: ", formatCI(diff_ci, unit = "%", digits = 2, plot = F), "\n", sep="")

## Attrition rate: 11%
##   Rich first:  9%
##   Poor first:  14%
##   Difference: -4.9%, 95% CI [-16%, 5.8%]

N_failedcheck = sum(data_all$failed_attention_check)
N_failedcheck_r = sum(data_all_rich_first$failed_attention_check)
N_failedcheck_p = sum(data_all_poor_first$failed_attention_check)
diff_ci = 100 * diffpropCI(N_failedcheck_r, N_dropped_r, N_failedcheck_p, N_dropped_p)

cat("Among those who dropped after agreeing to the consent form:\n")

cat("\nFailed attention check: ", round(N_failedcheck / N_dropped * 100), "%\n", sep="")
cat("  Rich first:  ", round(N_failedcheck_r / N_dropped_r * 100), "%\n", sep="")
cat("  Poor first:  ", round(N_failedcheck_p / N_dropped_p * 100), "%\n", sep="")
cat("  Difference: ", formatCI(diff_ci, unit = "%", digits = 2, plot = F), "\n", sep="")

N_reloaded = sum(data_all$reloaded)
N_reloaded_r = sum(data_all_rich_first$reloaded)
N_reloaded_p = sum(data_all_poor_first$reloaded)
diff_ci = 100 * diffpropCI(N_reloaded_r, N_dropped_r, N_reloaded_p, N_dropped_p)

cat("\nAttempted page reload: ", round(N_reloaded / N_dropped * 100), "%\n", sep="")
cat("  Rich first:  ", round(N_reloaded_r / N_dropped_r * 100), "%\n", sep="")
cat("  Poor first:  ", round(N_reloaded_p / N_dropped_p * 100), "%\n", sep="")
cat("  Difference: ", formatCI(diff_ci, unit = "%", digits = 2, plot = F), "\n", sep="")

N_other = N_dropped - N_failedcheck - N_reloaded
N_other_r = N_dropped_r - N_failedcheck_r - N_reloaded_r
N_other_p = N_dropped_p - N_failedcheck_p - N_reloaded_p
diff_ci = 100 * diffpropCI(N_other_r, N_dropped_r, N_other_p, N_dropped_p)

cat("\nReason unknown: ", round(N_other / N_dropped * 100), "%\n", sep="")
cat("  Rich first:  ", round(N_other_r / N_dropped_r * 100), "%\n", sep="")
cat("  Poor first:  ", round(N_other_p / N_dropped_p * 100), "%\n", sep="")
cat("  Difference: ", formatCI(diff_ci, unit = "%", digits = 2, plot = F), "\n", sep="")

## Among those who dropped after agreeing to the consent form:
## 
## Failed attention check: 50%
##   Rich first:  67%
##   Poor first:  40%
##   Difference:  27%, 95% CI [-24%,  65%]
## 
## Attempted page reload: 12%
##   Rich first:  17%
##   Poor first:  10%
##   Difference: 6.7%, 95% CI [-30%,  50%]
## 
## Reason unknown: 38%
##   Rich first:  17%
##   Poor first:  50%
##   Difference: -33%, 95% CI [-68%,  18%]

In case all CIs of differences above clearly include zero, we will be able to conclude that there is no clear evidence of unbalanced attrition in our data and we will go on. Note that we should not be surprised if one of these intervals excludes zero, since we have four 95% CIs here that are not corrected for multiplicity. Assuming for simplicity that the four effects are independent, this means that even with perfectly balanced attrition effects, at least one of the four CIs will exclude zero with 1-(1-0.05)^4 = 19% probability. If we run four experiments, then that becomes a 1-(1-0.05)^16 = 56% probability. So we should only worry if a CI is very far from zero, or if it consistently excludes zero across experiments.

For more clarity we may report all values above in a table, or only report overall estimates and differences (skipping estimates per condition).

4 Sample description

Here we report basic information about our sample (sample size, demographics, median completion time). From now on, only participants who successfully completed the job are included in the analyses.

cat("Total sample size:", N, "\n")
cat("       Rich first:", N_r, "\n")
cat("       Poor first:", N_p, "\n")

## Total sample size: 128 
##        Rich first: 64 
##        Poor first: 64

t = median(data$time_taken) / 60
t_r = median(data_rich_first$time_taken) / 60
t_p = median(data_poor_first$time_taken) / 60
diff_ci = diffMedianCI.bootstrap(data_rich_first$time_taken, data_poor_first$time_taken)

cat("Median completion time:", round(t), "min\n")
cat("            Rich first:", round(t_r), "min\n")
cat("            Poor first:", round(t_p), "min\n")
cat("            Difference: ", formatCI(diff_ci, unit = " sec", digits = 2, plot = F), "\n", sep="")

## Median completion time: 8 min
##             Rich first: 8 min
##             Poor first: 9 min
##             Difference: -56 sec, 95% CI [-1.6e+02 sec,  61 sec]

freqPlot(data, "country")

histPlot(data, "age")

## Mean age is 30 (min = 18, max = 68)

freqPlot(data, "sex")

Note that we do not look at differences in demographics between the two conditions, because people were randomly assigned to conditions and thus the null hypothesis is true by definition. See Diana and Pemantle (2011) The perils of randomization checks in the analysis of experiments Annual meeting of the Society for Political Methodology.

5 Overview of responses

Descriptive plots will provide an overview of responses to the three questions (distributions, means), for each of the two experimental groups (richFirst and poorFirst).

These plots will not be used to draw inferences, so their code is not included in this plan.

# Insert code for descriptive plots here.

6 Effect sizes

This final part of the analysis directly addresses our research questions.

The three effect sizes estimated in this section (one primary, two secondary) capture in three different ways the average increase in the money allocated to SouthEast Asia when its data is shown with an information-rich visualization. Each effect corresponds to a different way of measuring money allocated to SouthEast Asia, which we will call here DV1, DV2, and DV3 (for dependent variable).

We will analyze and interpret our results using estimation statistics. See:

Cumming, G. (2014). The new statistics: Why and how. Psychological science, 25(1), 7-29.
Dragicevic, P. (2016). Fair statistical communication in HCI. In Modern Statistical Methods for HCI (pp. 291-330). Springer, Cham.

All effects in this section are estimated using BCa bootstrap confidence intervals. These provide good interval estimates without distributional assumptions for sample sizes of about 20 or more (we have 64 and 64). See Kirby, K. N., & Gerlanc, D. (2013). BootES: An R package for bootstrap confidence intervals on effect sizes. Behavior research methods, 45(4), 905-927.

Since we identify a single primary outcome in this analysis, no adjustment for multiplicitiy is required. We will interpret results for the secondary outcomes as tentative and exploratory, especially if the primary outcome is inconclusive. See:

Bender, R., & Lange, S. (2001). Adjusting for multiple testing—when and how?. Journal of clinical epidemiology, 54(4), 343-349.
Li, G., et al. (2016). An introduction to multiplicity issues in clinical trials: the what, why, when and how. International journal of epidemiology, 46(2), 746-755.

6.1 Primary effect

DV1 = Additional funds allocated to SouthEast Asia / UNHCR compared to the Middle East / IOM (negative if less funds are allocated)

It is the participant’s response to the first question minus their response to the second question. Note that the choice of SouthEast Asia as baseline is arbitrary and does not matter.
Unit: percent difference. Range: -100, 100.

The reported effect of interest is then the difference in mean DV1 between the richFirst and the poorFirst groups. In other words, it is the interaction between the scenario condition (Southeast Asia vs. Middle East, within-subjects) and the visualization order condition (richFirst vs. poorFirst, between-subjects). If information richness has no effect, there should be no interaction. Conversely, if information richness has an effect, the interaction should be non-zero (i.e., positive if information richness prompts people to allocate more money, negative otherwise).

effect1 = Difference in mean DV1 between the richFirst and the poorFirst groups.

Unit: percent difference. Range: -200, 200.
Role: primary outcome, used to answer the research question (this choice because it’s likely the DV with highest statistical power)

# Set random seed for bootstrapping
set.seed(0)

DV1_rich = data_rich_first$fund_allocation_difference
DV1_poor = data_poor_first$fund_allocation_difference
effect1 = diffMeanCI.bootstrap(DV1_rich, DV1_poor)
cat("Effect 1 = ", formatCI(effect1))

## Effect 1 =  3.8, 95% CI [-1.7, 9.3]

6.2 Secondary effects

DV2 = Percentage of global UNHCR funds allocated to SouthEast Asia (the rest being allocated to other parts of the world)

It is the response to the first question only. This is the between-subjects analysis, taking into account only the first stimulus.
Unit: percent. Range: 0, 100.

The effect reported here is the difference in mean DV2 between the richFirst and the poorFirst groups.

Unit: percent difference. Range: -100, 100.
Role: secondary outcome (alternative measure that is better able to guarantee that participants are naïve to the purpose of the experiment)

DV2_rich = data_rich_first$fund_allocation_southeast_asia
DV2_poor = data_poor_first$fund_allocation_southeast_asia
effect2 = diffMeanCI.bootstrap(DV2_rich, DV2_poor)
cat("Effect 2 = ", formatCI(effect2))

## Effect 2 =  2.7, 95% CI [-2.3, 7.9]

DV3 = Percentage of personal money donated to SouthEast Asia / SEARAC (the rest being donated to the Middle East / World Relief)

It is the response to the third question.
Unit: percent. Range: 0, 100.

The effect reported here is the difference in mean DV3 between the richFirst and the poorFirst groups.

Unit: percent difference. Range: -100, 100.
Role: secondary outcome (alternative to the first reported effect. Will help us determine whether it’s a better way of asking the question).

DV3_rich = data_rich_first$donation_allocation_southeast_asia
DV3_poor = data_poor_first$donation_allocation_southeast_asia
effect3 = diffMeanCI.bootstrap(DV3_rich, DV3_poor)
cat("Effect 3 = ", formatCI(effect3))

## Effect 3 =  -1.8, 95% CI [-7.1, 3.1]

Plot all effect size CIs. We put them in the same plot here only for convenience. The three effect sizes are not on the same scale and can’t be directly compared, so we will probably not combine them in the same plot like here.

plotAllCIs(lim = c(0, 25), xlabel = "Percent increase in donations for information-rich visualizations")

ggsave(filename = "../../paper-figures/exp1-effects.pdf", dpi = 300, width = 6, height = 2)

Note that effect 1 is on a different scale from effect 2, and because of the way it is calculated we should expect it to be about twice as large as effect 2. When deciding which measure is the most sensitive (e.g., to inform the design of our next experiment), we should not look at how large each effect is, but at how far its CI is from zero.

Analysis of experiment 1

Porthos, Athos and Aramis

October 18, 2019