2 Simulation example
2.1 Data-generating process
The data-generating process is described via the directed acyclic graph (DAG) in Fig. 2.1. In this DAG, glare perception \((\text{G})\) is directly influenced by social interaction \((\text{I})\) and window state \((\text{W})\), and indirectly by occupancy \((\text{O})\) through social interaction \((\text{I})\). Additionally, occupancy \((\text{O})\) and window state \((\text{W})\) directly influence CO2 concentration \((\text{C})\).
Code
dag_coords.ex2 <-
data.frame(name = c('I', 'O', 'C', 'G', 'W'),
x = c(1, 1, 3.5, 6, 6),
y = c(1, 3, 2, 1, 3))
DAG.ex2 <-
dagify(I ~ O,
C ~ O,
C ~ W,
G ~ W,
G ~ I + W,
coords = dag_coords.ex2)
node_labels <- c(
I = 'bold(I)',
O = 'bold(O)',
C = 'bold(C)',
G = 'bold(G)',
W = 'bold(W)'
)
ggplot(data = DAG.ex2, aes(x = x, y = y, xend = xend, yend = yend)) +
geom_dag_text(aes(label = node_labels[name]),
parse = TRUE,
colour = 'black',
size = 10,
family = 'mono') +
geom_dag_edges(arrow_directed = grid::arrow(length = grid::unit(10, 'pt'), type = 'open'),
edge_colour = 'black',
family = 'mono',
fontface = 'bold') +
annotate('text', x = 1, y = 0.7, label = 'social interaction',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 3.5, y = 1.7, label = 'CO2',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 1, y = 3.3, label = 'occupancy',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 6, y = 3.3, label = 'window state',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 6, y = 0.7, label = 'glare perception',
size = 4, hjust = 0.5, colour = 'grey50') +
coord_cartesian(xlim = c(0.5, 6.5), ylim = c(0.8, 3.2)) +
theme_dag()
The DAG in Fig. 2.1 can be written as:
- \(I \sim f_{I}(O)\), read as ‘social interaction \((\text{I})\) is some function of occupancy \((\text{O})\)’.
- \(C \sim f_{C}(O, W)\), read as ‘CO2 concentration \((\text{C})\) is some function of occupancy \((\text{O})\) and window \((\text{W})\)’.
- \(G \sim f_{G}(I, W)\), read as ‘glare perception \((\text{G})\) is some function of social interaction \((\text{I})\) and window \((\text{W})\)’.
2.2 Synthetic data set
To generate synthetic data, we defined the custom function data.sim_ex2(). This function takes as inputs the sample size n and generates synthetic data according to the DAG in Fig. 2.1.
data.sim_ex2 <- function(n) {
b_socint.glare = -2 #direct causal effect of I on G
b_win.glare = c(0, 20) #direct causal effect of W on G
#Simulate occupancy
O <- factor(sample(c('low', 'high'), size = n, replace = TRUE))
#Simulate window state
W <- factor(sample(c('closed', 'open'), size = n, replace = TRUE))
#Simulate CO2
C <- rnorm(n = n, mean = 400 + ifelse(O == 'low', 0, 50) + ifelse(W == 'closed', 0, 30), sd = 10)
#Simulate social interaction
I <- rpois(n = n, lambda = 7 + ifelse(O == 'low', 0, 3))
#Simulate glare perception
G <- rnorm(n = n, mean = 50 + b_socint.glare * I + ifelse(W == 'closed', b_win.glare[1], b_win.glare[2]), sd = 5)
#Return tibble with simulated values
return(tibble(O, I, C, W, G))
}From this data generation mechanism, we simulated the target population, which consists of one million observations.
set.seed(2025) #set random number for reproducibility
#Simulate the population
ex2_population <- data.sim_ex2(n = 1e6)
#Set occupancy reference category to 'low'
ex2_population$O <- relevel(ex2_population$O, ref = 'low')
#Set window state reference category to 'closed'
ex2_population$W <- relevel(ex2_population$W, ref = 'closed')
#View the data frame
ex2_population# A tibble: 1,000,000 × 5
O I C W G
<fct> <int> <dbl> <fct> <dbl>
1 low 5 422. closed 41.3
2 high 6 474. closed 42.5
3 high 13 473. open 35.9
4 high 11 461. open 42.1
5 low 2 391. closed 43.8
6 low 12 404. closed 18.4
7 high 16 452. closed 24.0
8 low 10 416. open 54.2
9 high 15 496. open 33.4
10 low 5 415. open 50.4
# ℹ 999,990 more rows
From this population, we obtained one data set of five thousand observations using simple random sampling.
set.seed(ex2_random.seed[1])
#Sample one data set of 5,000 observations
ex2_sample.random <-
ex2_population %>%
slice_sample(n = 5e3) #take a simple random sample of size 5,000
#View the data frame
ex2_sample.random# A tibble: 5,000 × 5
O I C W G
<fct> <int> <dbl> <fct> <dbl>
1 high 11 463. closed 32.9
2 low 11 404. closed 27.3
3 low 2 393. closed 40.5
4 high 10 481. open 45.5
5 low 7 398. closed 38.2
6 low 6 422. open 62.4
7 high 11 472. open 47.9
8 low 5 424. open 58.3
9 high 13 465. open 47.2
10 low 6 417. open 64.5
# ℹ 4,990 more rows
2.3 Data analysis
In this example, the target of our analysis is the total average causal effect, ACE (also known as total average treatment effect, ATE) of social interaction \((\text{I})\) on glare perception \((\text{G})\), which stands for the expected increase of \(\text{G}\) in response to a unit increase in \(\text{I}\) due to an intervention. The causal effect of interest is visualized in Fig. 2.2.
Code
ggplot(data = DAG.ex2, aes(x = x, y = y, xend = xend, yend = yend)) +
#visualise causal effect path
geom_segment(x = 1, xend = 6, y = 1, yend = 1,
linewidth = 14, lineend = 'round', colour = '#009E73', alpha = 0.05) +
geom_dag_text(aes(label = node_labels[name]),
parse = TRUE,
colour = 'black',
size = 10,
family = 'mono') +
geom_dag_edges(arrow_directed = grid::arrow(length = grid::unit(10, 'pt'), type = 'open'),
edge_colour = 'black',
family = 'mono',
fontface = 'bold') +
annotate('text', x = 1, y = 0.7, label = 'social interaction',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 3.5, y = 1.7, label = 'CO2',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 1, y = 3.3, label = 'occupancy',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 6, y = 3.3, label = 'window state',
size = 4, hjust = 0.5, colour = 'grey50') +
annotate('text', x = 6, y = 0.7, label = 'glare perception',
size = 4, hjust = 0.5, colour = 'grey50') +
#causal effect number
annotate('text', x = 3.5, y = 1.2, label = '-2',
size = 4.5, hjust = 0.5, colour = 'black', parse = TRUE) +
coord_cartesian(xlim = c(0.5, 6.5), ylim = c(0.8, 3.2)) +
theme_dag()
2.3.1 Identification
The first step to answer the causal question of interest is identification. Identification answers a ‘theoretical’ question by determining whether a causal effect can, in principle, be estimated from observed data. The backdoor criterion and its generalization, the adjustment criterion, allow us to understand whether our causal effect of interest can be identified and, if so, which variables we should (or should not) statistically adjust for (i.e., the adjustment set) to estimate the causal effect from the data.
Given its simplicity, we will first apply the backdoor criterion to identify valid adjustment sets to estimate the causal effect of interest. If the backdoor criterion is not applicable, we will apply its generalization, the adjustment criterion.
Backdoor criterion
Applying the backdoor criterion revealed the absence of any backdoor path (i.e., a non-causal path) from social interaction \((\text{I})\) to glare perception \((\text{G})\). As a result, there is no confounding and no adjustment is required.
Given the DAG in Fig. 2.2, we can use the adjustmentSets() function to identify the adjustment set algorithmically. It is essential to note that this function applies the adjustment criterion and not the backdoor criterion. As such, the adjustmentSets() function can find adjustment sets even when the backdoor criterion is not applicable.
adjustmentSets(DAG.ex2,
exposure = 'I', #social interaction
outcome = 'G', #glare perception
type = 'all',
effect = 'total',
max.results = Inf) {}
{ O }
{ C, O }
{ W }
{ C, W }
{ O, W }
{ C, O, W }
As expected, the resulting adjustment set includes the empty set (i.e., no adjustment is required). However, the backdoor criterion reveals that there are six other possible adjustment sets to get the correct estimate of the total average causal effect of \(\text{I}\) on \(\text{G}\). Specifically:
- adjust for occupancy \((\text{O})\)
- adjust for CO2 concentration \((\text{C})\) and occupancy \((\text{O})\)
- adjust for window state \((\text{W})\)
- adjust for CO2 concentration \((\text{C})\) and window state \((\text{W})\)
- adjust for occupancy \((\text{O})\) and window state \((\text{W})\)
- adjust for CO2 concentration \((\text{C})\), occupancy \((\text{O})\) and window state \((\text{W})\)
Therefore, to get the correct estimate of the total average causal effect, any of the aforementioned adjustment sets will work; failing to do so will lead to bias.
2.3.2 Estimation
Following the identification step is the estimation step. This step addresses a statistical question by determining how the causal effect identified in the previous step can be estimated. To perform this step, we used a parametric (model-based) estimator, specifically, linear regression. This was possible because we designed the illustrative examples to be simple and with a linear relationship between the variables. This way, we limited the complexity of the examples themselves and shifted the focus to the application of the backdoor criterion to define ‘correct’ adjustment sets.
For transparency and understanding, all (implicit) assumptions used for this illustrative example are (explicitly) provided in Table 2.1.
| Research question | Total average causal effect (ACE) of social interaction (I) on glare perception (G). |
| Assumptions | Random sample (simple random sampling): everyone in the population has an equal chance of being selected into the sample. |
| Limited random variability: large sample size. | |
| Independence of observations: each observation represents independent bits of information. | |
| No confounding: the DAG includes all shared causes among the variables. | |
| No model error: perfect functional form specification. | |
| No measurement error: all variables are measured perfectly. | |
| Variables | Social interaction (I): discrete (count) variable [unit: -] |
| Occupancy (O): categorical variable ['low'; 'high'] | |
| CO2 concentration (C): continuous variable [unit: ppm] | |
| Window state (W): categorical variable ['closed'; 'open'] | |
| Glare perception (G): continuous variable [unit: -] |
To carry out the estimation step, we utilized linear regression within both the frequentist and Bayesian frameworks. Specifically, we will run three regression models:
-
Model.1will include only social interaction \((\text{I})\) as predictor; -
Model.2will include social interaction \((\text{I})\) and CO2 concentration \((\text{C})\) as predictors. -
Model.3will include social interaction \((\text{I})\), CO2 concentration \((\text{C})\) and window state \((\text{W})\) as predictors.
The results of the fitted statistical models (i.e., Model.1, Model.2 and Model.3) are presented here.
Frequentist framework
Click to expand
#Fit the linear regression model with I and G (Model 1)
ex2_Model.1 <-
lm(formula = G ~ I,
data = ex2_sample.random)
#View of the model summary
summary(ex2_Model.1)
Call:
lm(formula = G ~ I, data = ex2_sample.random)
Residuals:
Min 1Q Median 3Q Max
-27.8784 -9.9755 -0.2005 10.0889 25.5743
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.29801 0.43856 135.2 <2e-16 ***
I -1.92618 0.04756 -40.5 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 11.21 on 4998 degrees of freedom
Multiple R-squared: 0.2471, Adjusted R-squared: 0.2469
F-statistic: 1640 on 1 and 4998 DF, p-value: < 2.2e-16
#Fit the linear regression model with I, C and G (Model 2)
ex2_Model.2 <-
lm(formula = G ~ I + C,
data = ex2_sample.random)
#View of the model summary
summary(ex2_Model.2)
Call:
lm(formula = G ~ I + C, data = ex2_sample.random)
Residuals:
Min 1Q Median 3Q Max
-29.1718 -7.1808 -0.1675 7.4538 28.7898
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.57969 2.04883 -8.58 <2e-16 ***
I -2.62628 0.04568 -57.49 <2e-16 ***
C 0.18872 0.00494 38.21 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.862 on 4997 degrees of freedom
Multiple R-squared: 0.4173, Adjusted R-squared: 0.4171
F-statistic: 1789 on 2 and 4997 DF, p-value: < 2.2e-16
#Fit the linear regression model with I, C, W and G (Model 3)
ex2_Model.3 <-
lm(formula = G ~ I + C + W,
data = ex2_sample.random)
#View of the model summary
summary(ex2_Model.3)
Call:
lm(formula = G ~ I + C + W, data = ex2_sample.random)
Residuals:
Min 1Q Median 3Q Max
-18.1535 -3.4904 0.0251 3.4565 18.6750
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.759671 1.198330 42.359 <2e-16 ***
I -1.975151 0.024053 -82.115 <2e-16 ***
C -0.002258 0.003002 -0.752 0.452
Wopen 20.079323 0.169566 118.416 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.055 on 4996 degrees of freedom
Multiple R-squared: 0.8469, Adjusted R-squared: 0.8468
F-statistic: 9214 on 3 and 4996 DF, p-value: < 2.2e-16
The estimated coefficients for the three models are then plotted in Fig. 2.3.
Code
b_socint.glare = -2
data.frame(model = c('Model.1', 'Model.2', 'Model.3'),
estimate = c(coef(ex2_Model.1)['I'],
coef(ex2_Model.2)['I'],
coef(ex2_Model.3)['I']),
lower.95.CI = c(confint(ex2_Model.1, level = 0.95, type = 'Wald')['I', 1],
confint(ex2_Model.2, level = 0.95, type = 'Wald')['I', 1],
confint(ex2_Model.3, level = 0.95, type = 'Wald')['I', 1]),
upper.95.CI = c(confint(ex2_Model.1, level = 0.95, type = 'Wald')['I', 2],
confint(ex2_Model.2, level = 0.95, type = 'Wald')['I', 2],
confint(ex2_Model.3, level = 0.95, type = 'Wald')['I', 2])) %>%
ggplot(aes(x = estimate, y = model, xmin = lower.95.CI, xmax = upper.95.CI)) +
geom_vline(xintercept = b_socint.glare, alpha = 0.8, linetype = 'dashed', colour = 'blue') +
geom_linerange() +
geom_point(shape = 21, size = 2, fill = 'white', stroke = 1) +
scale_x_continuous('Estimate',
breaks = seq(from = -5, to = 5, by = 0.25),
limits = c(-3, -1.5)) +
theme(panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA),
axis.title.y = element_blank())
Fig. 2.3 shows the estimates (point estimate and 95% confidence interval) of the coefficient for social interaction for Model.1, Model.2 and Model.3.
For Model.1 we found a negative coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) equal to -1.926 with 95% confidence interval (CI) [-2.019, -1.833]. Since the 95% CI excludes zero, the regression coefficient is statistically significantly different from zero at the 0.05 level (p-value = 2.35e-310). Additionally, since the 95% CI includes the data-generating parameter for social interaction (i.e., b_socint.glare = -2), we can deduce that the estimated coefficient for social interaction is not statistically significantly different from -2 at the 0.05 level. We can test this more formally by using the linearHypothesis() function. Specifically,
#Test for I = -2
car::linearHypothesis(ex2_Model.1, 'I = -2') # car:: access the function from the car package without loading it in the environment
Linear hypothesis test:
I = - 2
Model 1: restricted model
Model 2: G ~ I
Res.Df RSS Df Sum of Sq F Pr(>F)
1 4999 628314
2 4998 628012 1 302.67 2.4088 0.1207
The resulting p-value is 0.121, which indicates that we fail to reject the null hypothesis (i.e., I = -2) at the 0.05 level. This result suggests that the regression coefficient for social interaction (i.e., -1.926) is not statistically significantly different from -2. Since the estimated causal effect from Model.1 is unbiased, we would correctly conclude that an increase of unit in social interaction causes a decrease in glare perception by -1.926 units (95% CI [-2.019, -1.833]).
For Model.2 we found a negative coefficient between coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) equal to -2.626 with 95% CI [-2.716, -2.537]. Since the 95% CI excludes zero, the regression coefficient is statistically significantly different from zero at the 0.05 level (p-value = 0). Additionally, since the 95% CI does not include the data-generating parameter for social interaction (i.e., b_socint.glare = -2), we can deduce that the estimated coefficient for social interaction is statistically significantly different from -2 at the 0.05 level (although this will be the case for all numbers within the 95% confidence interval). We can test this more formally by using the linearHypothesis() function. Specifically,
#Test for I = -2
car::linearHypothesis(ex2_Model.2, 'I = -2') # car:: access the function from the car package without loading it in the environment
Linear hypothesis test:
I = - 2
Model 1: restricted model
Model 2: G ~ I + C
Res.Df RSS Df Sum of Sq F Pr(>F)
1 4998 504312
2 4997 486032 1 18280 187.94 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The resulting p-value is 5.11e-42, which indicates that we can reject the null hypothesis (i.e., I = -2) at the 0.05 level. This result suggests that the regression coefficient for social interaction (i.e., -2.626) is statistically significantly different from -2. Since the estimated causal effect from Model.2 is biased, we would erroneously conclude that an increase of unit in social interaction causes a decrease in glare perception by -2.626 units (95% CI [-2.716, -2.537]).
For Model.3 we found a negative coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) equal to -1.975 with 95% CI [-2.022, -1.928]. Since the 95% CI excludes zero, the regression coefficient is statistically significantly different from zero at the 0.05 level (p-value = 0). Additionally, since the 95% CI includes the data-generating parameter for social interaction (i.e., b_socint.glare = -2), we can deduce that the estimated coefficient for social interaction is not statistically significantly different from -2 at the 0.05 level. We can test this more formally by using the linearHypothesis() function. Specifically,
#Test for I = -2
car::linearHypothesis(ex2_Model.3, 'I = -2') # car:: access the function from the car package without loading it in the environment
Linear hypothesis test:
I = - 2
Model 1: restricted model
Model 2: G ~ I + C + W
Res.Df RSS Df Sum of Sq F Pr(>F)
1 4997 127705
2 4996 127678 1 27.275 1.0673 0.3016
The resulting p-value is 0.302, which indicates that we fail to reject the null hypothesis (i.e., I = -2) at the 0.05 level. This result suggests that the regression coefficient for social interaction (i.e., -1.975) is not statistically significantly different from -2. Since the estimated causal effect from Model.3 is unbiased, we would correctly conclude that an increase of unit in social interaction causes a decrease in glare perception by -1.975 units (95% CI [-2.022, -1.928]).
Importantly, for Model.1, Model.2, and Model.3, the 95% confidence interval means that if we were to repeat the sampling process and calculate the interval many times, 95% of those calculated intervals would contain the true population parameter. To highlight this, we can repeat the analysis by fitting the three models to one thousand data sets randomly selected from our population.
The for-loop shown in the code below performs the following operations. First, sample (using simple random sampling) a data set of 5,000 observations from the target population. Subsequently, perform linear regression using Model.1, Model.2 and Model.3 and store the estimated coefficients for interaction, its standard error and 95% confidence interval in the data frame coefs_ex2. This operation is repeated a thousand times, resulting in the data frame coefs_ex2 containing the estimates (point estimate, standard error and confidence interval) of a thousand random samples of size 5,000 using Model.1, Model.2 and Model.3.
n_model <- c('mod.1', 'mod.2', 'mod.3')
n_row <- n_sims*length(n_model)
#Create an empty data frame
empty.df <- data.frame(matrix(NA, nrow = n_row, ncol = 7))
#Rename the data frame columns
colnames(empty.df) <- c('sim.id', 'estimate', 'se', 'CI_2.5', 'CI_97.5',
'model', 'coverage')
#Sample a thousand data sets of 5,000 observations and perform linear regression
coefs_ex2 <- empty.df #assign the empty data frame
k = 1
for (i in 1:n_sims){
set.seed(ex2_random.seed[i]) #set unique seed for each simulation
#Sample data set from population
sample.random <-
ex2_population %>%
slice_sample(n = 5e3) #take a simple random sample of size 5,000
#Fit models
for (j in 1:length(n_model)){
if (n_model[j] == 'mod.1'){
fit <- lm(formula = G ~ I,
data = sample.random)
} else if (n_model[j] == 'mod.2'){
fit <- lm(formula = G ~ I + C,
data = sample.random)
} else {
fit <- lm(formula = G ~ I + C + W,
data = sample.random)
}
#Compile matrix
coefs_ex2[k, 1] <- i #simulation ID
coefs_ex2[k, 2] <- coef(fit)['I'] #point estimate
coefs_ex2[k, 3] <- summary(fit)$coef['I','Std. Error'] #standard error
coefs_ex2[k, 4:5] <- confint(fit, level = 0.95, type = 'Wald')['I', ] #confidence interval (Wald)
coefs_ex2[k, 6] <- n_model[j] #sample size
k = k + 1
}
}
coefs_ex2 <- as_tibble(coefs_ex2)
#View the data frame
coefs_ex2# A tibble: 3,000 × 7
sim.id estimate se CI_2.5 CI_97.5 model coverage
<int> <dbl> <dbl> <dbl> <dbl> <chr> <lgl>
1 1 -1.93 0.0476 -2.02 -1.83 mod.1 NA
2 1 -2.63 0.0457 -2.72 -2.54 mod.2 NA
3 1 -1.98 0.0241 -2.02 -1.93 mod.3 NA
4 2 -2.08 0.0483 -2.17 -1.98 mod.1 NA
5 2 -2.71 0.0452 -2.80 -2.62 mod.2 NA
6 2 -2.04 0.0242 -2.09 -1.99 mod.3 NA
7 3 -2.07 0.0483 -2.16 -1.97 mod.1 NA
8 3 -2.65 0.0457 -2.74 -2.56 mod.2 NA
9 3 -1.99 0.0238 -2.03 -1.94 mod.3 NA
10 4 -2.01 0.0481 -2.10 -1.92 mod.1 NA
# ℹ 2,990 more rows
The coverage is defined by setting its value to 1 if the confidence interval overlaps the data-generating parameter for interaction (i.e., b_socint.glare = -2) and 0 otherwise.
#Calculate coverage
coefs_ex2 <-
coefs_ex2 %>%
mutate(coverage = case_when(CI_2.5 > b_socint.glare | CI_97.5 < b_socint.glare ~ 0,
CI_2.5 <= b_socint.glare & CI_97.5 >= b_socint.glare ~ 1,
.default = NA))
#View the data frame
coefs_ex2# A tibble: 3,000 × 7
sim.id estimate se CI_2.5 CI_97.5 model coverage
<int> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 1 -1.93 0.0476 -2.02 -1.83 mod.1 1
2 1 -2.63 0.0457 -2.72 -2.54 mod.2 0
3 1 -1.98 0.0241 -2.02 -1.93 mod.3 1
4 2 -2.08 0.0483 -2.17 -1.98 mod.1 1
5 2 -2.71 0.0452 -2.80 -2.62 mod.2 0
6 2 -2.04 0.0242 -2.09 -1.99 mod.3 1
7 3 -2.07 0.0483 -2.16 -1.97 mod.1 1
8 3 -2.65 0.0457 -2.74 -2.56 mod.2 0
9 3 -1.99 0.0238 -2.03 -1.94 mod.3 1
10 4 -2.01 0.0481 -2.10 -1.92 mod.1 1
# ℹ 2,990 more rows
The results are then plotted in Fig. 2.4.
Code
model_names <- c('mod.1' = 'Model.1',
'mod.2' = 'Model.2',
'mod.3' = 'Model.3')
ggplot(data = subset(coefs_ex2, sim.id <= 2e2), aes(x = sim.id, y = estimate, ymin = CI_2.5, ymax = CI_97.5, colour = as.factor(coverage))) +
geom_hline(yintercept = b_socint.glare, alpha = 0.8, linetype = 'dashed', colour = 'blue') +
geom_linerange() +
geom_point(shape = 21, size = 1, fill = 'white', stroke = 0.5) +
scale_colour_manual('', values = c('black', '#FF4040'),
breaks = c('1','0')) +
scale_y_continuous('Estimate') +
scale_x_continuous('Simulation ID') +
facet_wrap(model~.,
labeller = labeller(model = model_names),
nrow = 3,
scales = 'fixed') +
theme(legend.position = 'none',
panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA),
axis.title.y = element_blank())
Fig. 2.4 shows the first 200 estimates (point estimate and confidence interval) of the coefficient for social interaction for Model.1, Model.2 and Model.3. If all the thousand simulations are considered, the frequency of the coverage of the calculated confidence intervals (i.e., how many times the confidence intervals overlap the data-generating parameter) is 94.6%, 0.0% and 94.4% for Model.1, Model.2 and Model.3, respectively. Since the estimate of the causal effect for Model.2 are biased, the calculated confidence intervals for this model do not have the expected coverage. In fact, confidence intervals only quantify the uncertainty due to random error (i.e., sample variability), not systematic error (i.e., bias). Instead, the estimates from Model.1 and Model.3 are unbiased and the calculated 95% confidence intervals have the expected coverage (i.e., overlap the data-generating parameter 95% of the times).
We can also visualize all the 1,000 estimates of the coefficient for social interaction for Model.1, Model.2 and Model.3 (the estimates were shown as white dots in Fig. 2.4).
Code
ggplot(data = coefs_ex2, aes(x = estimate, y = ifelse(after_stat(count) > 0, after_stat(count), NA))) +
geom_histogram(binwidth = 0.02, fill = 'white', colour = 'grey50') +
geom_vline(xintercept = b_socint.glare, colour = 'blue', linetype = 'dashed') +
scale_x_continuous('Estimate',
breaks = seq(from = -5, to = 5, by = 0.25),
limits = c(-3, -1.5)) +
facet_grid(model~.,
labeller = labeller(model = model_names),
scales = 'fixed') +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA))
In Fig. 2.5, the estimated coefficients are visualized through a histogram. Here, we can see that the estimates for Model.2 are not clustered around the data-generating parameter (blue dashed line) having a mean estimate of -2.634 (with 0.044 standard deviation). In contrast, the estimates for Model.1 and Model.3 having a mean estimate of -1.994 (with 0.048 standard deviation) and -2.001 (with 0.025 standard deviation), respectively, are centered around the data-generating parameter.
As expected, since there is no backdoor path open, regressing glare perception \((\text{G})\) on social interaction \((\text{I})\) (i.e., using Model.1) leads to the correct estimate of the total average causal effect. However, when CO2 is included as a predictor (i.e., using Model.2) the estimates are biased. Adjusting for CO2 alone opens the backdoor path \(\text{I} \leftarrow \text{O} \rightarrow \text{C} \leftarrow \text{W} \rightarrow \text{G}\) that was previously closed. As a result, association can flow from \(\text{I}\) to \(\text{G}\) through \(\text{O}\), \(\text{C}\) and \(\text{W}\). This happens because CO2 is a collider. However, when we also include window state \((\text{W})\) as a predictor (i.e., using Model.3) the backdoor path is closed, leading to the correct estimate of the total average causal effect of social interaction \((\text{I})\) on glare perception \((\text{G})\).
This bias is known as M-bias. Generally, adjusting for an inappropriate variable (a collider on a non-causal path) opens an M-shaped backdoor path and creates spurious association.
Bayesian framework
#Fit the linear regression model with I and G (Model 1)
ex2_Model.1 <- stan_glm(formula = G ~ I,
family = gaussian(),
data = ex2_sample.random,
#Prior coefficients
prior = normal(location = 0, scale = 4),
#Prior intercept
prior_intercept = normal(location = 50, scale = 10),
#Prior sigma
prior_aux = exponential(rate = 0.5),
iter = 4000, warmup = 1000,
save_warmup = TRUE,
chains = 4, cores = 4,
seed = 2025) #for reproducibility
#View of the model
ex2_Model.1stan_glm
family: gaussian [identity]
formula: G ~ I
observations: 5000
predictors: 2
------
Median MAD_SD
(Intercept) 59.3 0.4
I -1.9 0.0
Auxiliary parameter(s):
Median MAD_SD
sigma 11.2 0.1
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
#Fit the linear regression model with I, C and G (Model 2)
ex2_Model.2 <- stan_glm(formula = G ~ I + C,
family = gaussian(),
data = ex2_sample.random,
#Prior coefficients
prior = normal(location = c(0, 0), scale = c(4, 1)),
#Prior intercept
prior_intercept = normal(location = 50, scale = 10),
#Prior sigma
prior_aux = exponential(rate = 0.5),
iter = 4000, warmup = 1000,
save_warmup = TRUE,
chains = 4, cores = 4,
seed = 2025) #for reproducibility
#View of the model
ex2_Model.2stan_glm
family: gaussian [identity]
formula: G ~ I + C
observations: 5000
predictors: 3
------
Median MAD_SD
(Intercept) -17.5 2.0
I -2.6 0.0
C 0.2 0.0
Auxiliary parameter(s):
Median MAD_SD
sigma 9.9 0.1
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
#Fit the linear regression model with I, C, W and G (Model 3)
ex2_Model.3 <- stan_glm(formula = G ~ I + C + W,
family = gaussian(),
data = ex2_sample.random,
#Prior coefficients
prior = normal(location = c(0, 0, 0), scale = c(4, 1, 40)),
#Prior intercept
prior_intercept = normal(location = 50, scale = 10),
#Prior sigma
prior_aux = exponential(rate = 0.5),
iter = 4000, warmup = 1000,
save_warmup = TRUE,
chains = 4, cores = 4,
seed = 2025) #for reproducibility
#View of the model
ex2_Model.3stan_glm
family: gaussian [identity]
formula: G ~ I + C + W
observations: 5000
predictors: 4
------
Median MAD_SD
(Intercept) 50.8 1.2
I -2.0 0.0
C 0.0 0.0
Wopen 20.1 0.2
Auxiliary parameter(s):
Median MAD_SD
sigma 5.1 0.1
------
* For help interpreting the printed output see ?print.stanreg
* For info on the priors used see ?prior_summary.stanreg
In Bayesian analysis, there are important diagnostics that have to be carried out in order to assess the convergence and efficiency of the Markov Chains. This is done by using the monitor() function which computes summaries of MCMC (Markov Chain Monte Carlo) draws and monitor convergence. Specifically, we will look at Rhat, Bulk_ESS and Tail_ESS metrics.
#Diagnostics for model 1
monitor(ex2_Model.1$stanfit) Inference for the input samples (4 chains: each with iter = 4000; warmup = 0):
Q5 Q50 Q95 Mean SD Rhat Bulk_ESS Tail_ESS
(Intercept) 58.6 59.3 60.0 59.3 0.4 1 11109 7597
I -2.0 -1.9 -1.8 -1.9 0.0 1 11198 7423
sigma 11.0 11.2 11.4 11.2 0.1 1 12198 8926
mean_PPD 42.4 42.7 43.1 42.7 0.2 1 11230 10778
log-posterior -19189.8 -19187.1 -19186.1 -19187.4 1.2 1 5159 7821
For each parameter, Bulk_ESS and Tail_ESS are crude measures of
effective sample size for bulk and tail quantities respectively (an ESS > 100
per chain is considered good), and Rhat is the potential scale reduction
factor on rank normalized split chains (at convergence, Rhat <= 1.05).
#Diagnostics for model 2
monitor(ex2_Model.2$stanfit) Inference for the input samples (4 chains: each with iter = 4000; warmup = 0):
Q5 Q50 Q95 Mean SD Rhat Bulk_ESS Tail_ESS
(Intercept) -20.9 -17.5 -14.3 -17.6 2.0 1 14345 9537
I -2.7 -2.6 -2.6 -2.6 0.0 1 11779 8439
C 0.2 0.2 0.2 0.2 0.0 1 13380 9184
sigma 9.7 9.9 10.0 9.9 0.1 1 12884 9520
mean_PPD 42.4 42.7 43.1 42.7 0.2 1 11998 10505
log-posterior -18550.3 -18547.3 -18546.0 -18547.6 1.4 1 5374 6558
For each parameter, Bulk_ESS and Tail_ESS are crude measures of
effective sample size for bulk and tail quantities respectively (an ESS > 100
per chain is considered good), and Rhat is the potential scale reduction
factor on rank normalized split chains (at convergence, Rhat <= 1.05).
#Diagnostics for model 3
monitor(ex2_Model.3$stanfit) Inference for the input samples (4 chains: each with iter = 4000; warmup = 0):
Q5 Q50 Q95 Mean SD Rhat Bulk_ESS Tail_ESS
(Intercept) 48.8 50.8 52.8 50.8 1.2 1 10783 9224
I -2.0 -2.0 -1.9 -2.0 0.0 1 9359 8154
C 0.0 0.0 0.0 0.0 0.0 1 9643 8274
Wopen 19.8 20.1 20.4 20.1 0.2 1 9978 8954
sigma 5.0 5.1 5.1 5.1 0.1 1 9898 8482
mean_PPD 42.6 42.7 42.9 42.7 0.1 1 9558 9741
log-posterior -15208.6 -15205.1 -15203.5 -15205.5 1.6 1 5197 7655
For each parameter, Bulk_ESS and Tail_ESS are crude measures of
effective sample size for bulk and tail quantities respectively (an ESS > 100
per chain is considered good), and Rhat is the potential scale reduction
factor on rank normalized split chains (at convergence, Rhat <= 1.05).
Rhat is a metric used to assess the convergence of Markov Chain Monte Carlo (MCMC) simulations. It helps determine if the MCMC chains have adequately explored the target posterior distribution. Specifically, it compares the between- and within-chain estimates for model parameters: If chains have not mixed well (i.e., the between- and within-chain estimates do not agree), R-hat is larger than 1. A general rule of thumb is to use the sample only if R-hat is less than 1.05; a larger value suggests that the chains have not mixed well, and the results might not be reliable. In our three models, all Rhat are equal to 1 indicating that the chains have mixed well and have adequately explored the target posterior distribution.
Bulk_ESS and Tail_ESS stand for ‘Bulk Effective Sample Size’ and ‘Tail Effective Sample Size,’ respectively. Since MCMC samples are not truly independent (they are correlated), these metrics assess the sampling efficiencies, that is, they help evaluate how efficiently the MCMC sampler is exploring the parameter space.
-
Bulk_ESSis a useful measure for sampling efficiency in the bulk (center) of the distribution (e.g., efficiency of mean and median estimates); -
Tail_ESSis a useful measure for sampling efficiency in the tails of the distribution (e.g., efficiency of variance and tail quantile estimates). A general rule of thumb is that bothBulk-ESSandTail-ESSshould be at least 100 (approximately) per Markov Chain in order to be reliable and indicate that estimates of respective posterior quantiles are reliable. In our three models, allBulk-ESSandTail-ESSare well above 400 (i.e., 100 multiplied by 4, the number of chains we used) indicating that estimates of posterior quantiles are reliable.
Since we have established that the posteriors are reliable, we can now explore the model estimates.
The estimated coefficients for the three models are then plotted in Fig. 2.6.
Code
#Extract draws from model 1
post_ex2_Model.1 <-
ex2_Model.1 %>%
spread_draws(I) %>% #extract draws from the fitted model
mutate(model = 'Model.1') #add a new column to specify that the model
#Extract draws from model 2
post_ex2_Model.2 <-
ex2_Model.2 %>%
spread_draws(I) %>% #extract draws from the fitted model
mutate(model = 'Model.2') #add a new column to specify that the model
#Extract draws from model 3
post_ex2_Model.3 <-
ex2_Model.3 %>%
spread_draws(I) %>% #extract draws from the fitted model
mutate(model = 'Model.3') #add a new column to specify that the model
#Combine draws
plot.post <- rbind(post_ex2_Model.1, post_ex2_Model.2, post_ex2_Model.3)
# Plot
plot.post %>%
ggplot(aes(y = model, x = I)) +
stat_slabinterval(point_interval = 'mean_hdi',
.width = c(.95)) +
geom_vline(xintercept = b_socint.glare, alpha = 0.8, linetype = 'dashed', colour = 'blue') +
scale_x_continuous('Estimate',
breaks = seq(from = -5, to = 5, by = 0.25),
limits = c(-3, -1.5)) +
theme(panel.grid = element_blank(),
panel.background = element_blank(),
panel.border = element_rect(colour = 'black', fill = NA),
axis.title.y = element_blank())
Fig. 2.6 shows the estimates (mean and 95% HDI) of the coefficient for social interaction for Model.1, Model.2 and Model.3.
For Model.1 we found a negative coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) (mean = -1.927, 95% HDI [-2.022, -1.835]). The estimated causal effect is unbiased, leading to the correct conclusion that an increase of unit in social interaction causes a decrease in glare perception by -1.927 units, on average.
For Model.2 we found a negative coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) (mean = -2.625, 95% HDI [-2.714, -2.537]). However, the estimated causal effect is biased, leading to the wrong conclusion that an increase of unit in social interaction causes a decrease in glare perception by -2.625 units, on average.
For Model.3 we found a negative coefficient between social interaction \((\text{I})\) and glare perception \((\text{G})\) (mean = -1.975, 95% HDI [-2.023, -1.928]). However, now the estimated causal effect is unbiased, leading to the correct conclusion that an increase of unit in social interaction causes a decrease in glare perception by -1.975 units, on average.
Importantly, for Model.1, Model.2 and Model.3, the 95% HDI is the range of parameter values within which the most credible 95% of the posterior distribution falls. Unlike a frequentist confidence interval, the Bayesian 95% HDI has a direct probabilistic meaning: every point inside the HDI has a higher probability density than any point outside the interval. Therefore, given the model, the prior and the data, we can say that there is a 95% probability that the data-generating parameter (i.e., b_socint.glare = -2) lies within the HDI. However, since Model.2 leads to a biased estimate, we will reach the wrong conclusion by stating that there is a 95% probability that the data-generating parameter lies within the [-2.714, -2.537] interval. This probability is 0%.
As expected, since there is no backdoor path open, regressing glare perception \((\text{G})\) on social interaction \((\text{I})\) (i.e., using Model.1) leads to the correct estimate of the total average causal effect. However, when CO2 is included as a predictor (i.e., using Model.2) the estimates are biased. Adjusting for CO2 alone opens the backdoor path \(\text{I} \leftarrow \text{O} \rightarrow \text{C} \leftarrow \text{W} \rightarrow \text{G}\) that was previously closed. As a result, association can flow from \(\text{I}\) to \(\text{G}\) through \(\text{O}\), \(\text{C}\) and \(\text{W}\). This happens because CO2 is a collider. However, when we also include window state \((\text{W})\) as a predictor (i.e., using Model.3) the backdoor path is closed, leading to the correct estimate of the total average causal effect of social interaction \((\text{I})\) on glare perception \((\text{G})\).
This bias is known as M-bias. Generally, adjusting for an inappropriate variable (a collider on a non-causal path) opens an M-shaped backdoor path and creates spurious association.
Session info
Version information about R, the OS and attached or loaded packages.
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 26200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.32.7 StanHeaders_2.32.10 gt_1.1.0
[4] lubridate_1.9.4 forcats_1.0.1 stringr_1.6.0
[7] dplyr_1.1.4 purrr_1.2.0 readr_2.1.6
[10] tidyr_1.3.1 tibble_3.3.0 ggplot2_4.0.1
[13] tidyverse_2.0.0 ggdist_3.3.3 tidybayes_3.0.7
[16] rstanarm_2.32.2 Rcpp_1.1.0 dagitty_0.3-4
[19] ggdag_0.2.13
loaded via a namespace (and not attached):
[1] backports_1.5.0 plyr_1.8.9 igraph_2.2.1
[4] splines_4.2.3 svUnit_1.0.8 crosstalk_1.2.2
[7] rstantools_2.5.0 inline_0.3.21 digest_0.6.38
[10] htmltools_0.5.8.1 viridis_0.6.5 magrittr_2.0.4
[13] checkmate_2.3.3 memoise_2.0.1 tzdb_0.5.0
[16] graphlayouts_1.2.2 RcppParallel_5.1.11-1 matrixStats_1.5.0
[19] xts_0.14.1 timechange_0.3.0 colorspace_2.1-2
[22] ggrepel_0.9.6 rbibutils_2.4 xfun_0.54
[25] jsonlite_2.0.0 lme4_1.1-37 survival_3.8-3
[28] zoo_1.8-14 glue_1.8.0 reformulas_0.4.2
[31] polyclip_1.10-7 gtable_0.3.6 V8_8.0.1
[34] distributional_0.5.0 car_3.1-3 pkgbuild_1.4.8
[37] abind_1.4-8 scales_1.4.0 miniUI_0.1.2
[40] viridisLite_0.4.2 xtable_1.8-4 Formula_1.2-5
[43] stats4_4.2.3 DT_0.34.0 htmlwidgets_1.6.4
[46] threejs_0.3.4 arrayhelpers_1.1-0 RColorBrewer_1.1-3
[49] posterior_1.6.1 pkgconfig_2.0.3 loo_2.8.0
[52] farver_2.1.2 sass_0.4.10 utf8_1.2.6
[55] tidyselect_1.2.1 labeling_0.4.3 rlang_1.1.6
[58] reshape2_1.4.5 later_1.4.4 tools_4.2.3
[61] cachem_1.1.0 cli_3.6.5 generics_0.1.4
[64] evaluate_1.0.5 fastmap_1.2.0 yaml_2.3.10
[67] knitr_1.50 fs_1.6.6 tidygraph_1.3.1
[70] ggraph_2.2.2 nlme_3.1-168 mime_0.13
[73] xml2_1.5.0 compiler_4.2.3 bayesplot_1.14.0
[76] shinythemes_1.2.0 rstudioapi_0.17.1 curl_7.0.0
[79] tweenr_2.0.3 stringi_1.8.7 lattice_0.22-7
[82] Matrix_1.6-5 nloptr_2.2.1 markdown_2.0
[85] shinyjs_2.1.0 tensorA_0.36.2.1 vctrs_0.6.5
[88] pillar_1.11.1 lifecycle_1.0.4 Rdpack_2.6.4
[91] httpuv_1.6.16 QuickJSR_1.8.1 R6_2.6.1
[94] promises_1.5.0 gridExtra_2.3 codetools_0.2-20
[97] boot_1.3-32 colourpicker_1.3.0 MASS_7.3-58.2
[100] gtools_3.9.5 withr_3.0.2 shinystan_2.6.0
[103] parallel_4.2.3 hms_1.1.4 grid_4.2.3
[106] coda_0.19-4.1 minqa_1.2.8 rmarkdown_2.30
[109] S7_0.2.1 carData_3.0-5 otel_0.2.0
[112] ggforce_0.5.0 shiny_1.11.1 base64enc_0.1-3
[115] dygraphs_1.1.1.6