If you are worried about the use of a Normal distribution (or \(t\) distribution) for conducting hypothesis tests for a particular dataset one thing you can do is to take the dataset and generate the sampling distribution under the sharp null of no effect. You can see how close to normal it is and whether the \(p\) value indeed rejects the null of no effect 5% of the time.
We start off with a smallish \(N\) and extremely asymmetric data:
library(DeclareDesign)
library(tidyverse)
library(knitr)
set.seed(1)
data <- data.frame(Y = c(rnorm(180, 0), rnorm(20, 10000))/425)
data %>% ggplot(aes(Y)) + geom_histogram()
Then we generate the sampling distribution for ate estimates under a sharp null of no effect and assess the associated distribution of estimates, \(p\) values, and confidence intervals.
The below code can be modified for other problems. Be sure though that the assignment and estimation declarations reflects the one in the study you are interested in. Note the number of simulations below is really huge because the underlying distribution creates great volatility. The answers are quite tight however.
design <-
declare_model(data) +
declare_assignment(Z = complete_ra(N)) +
declare_inquiry(ATE = 0) +
declare_estimator(Y~Z)
simulations <- simulate_design(design, sims = 200000)
simulations %>% ggplot(aes(estimate)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
diagnose_design(simulations) %>% reshape_diagnosis() %>% kable()
Design | Inquiry | Estimator | Term | N Sims | Bias | RMSE | Power | Coverage | Mean Estimate | SD Estimate | Mean Se | Type S Rate | Mean Estimand |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
design | ATE | estimator | Z | 200000 | -0.00 | 1.00 | 0.03 | 0.97 | -0.00 | 1.00 | 1.00 | 1.00 | 0.00 |
(0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) | (0.00) |
The \(p\) value of 0.03 and coverage of 0.97 suggests a small divergence from normality with conservative implications.