ANOVAs as mixed effects models for randomised block designs and repeated-measures analyses.

Six functions are available for fitting linear models or linear mixed effects models. These can be used instead of simple ANOVAs or ‘mixed’ ANOVAs when there are random factors to consider.

These functions rely on `lm`

and `lmer`

(latter implemented in package `lmerTest`

).

This page examples of using linear mixed-effects models to get the:

- ANOVA table (
`mixed_anova`

,`mixed_anova_slopes`

) with F and P values - Fitted model (
`mixed_model`

,`mixed_model_slopes`

), which is useful for diagnostics and posthoc comparisons.

Both fit the full model with interaction terms when more than one fixed factors are provided. By default, variable intercepts only models are fit. For simplicity, `grafify`

currently implements one version of variable slopes model through `mixed_model_slopes`

and `mixed_anova_slopes`

. Refer to `lme4`

package for more complex analyses with random effects.

ANOVA tables show type II sum of squares (type II SS). ANOVA tables with `mixed_anova`

are generated using the `anova()`

call through `lmerTest`

that adds *P* values.
If you want Type III SS, first set contrasts using `options(contrasts=c("contr.sum", "contr.poly")`

*before fitting the model or getting the ANOVA table*. Fitting the model without this and setting `type = "III"`

will give **incorrect results**. The default contrasts in R can be obtained by running `options(contrasts=c("contr.treatment", "contr.poly"))`

.

See the data help page and ensure data table is in the long-format, and check that you have correctly converted categorical variables with `as.factor`

and columns with numeric variables are in the correct format.

We will use `data_1w_death`

and `data_2w_Festing`

in these examples. In this vignette we use ordinary linear models; see the vignette on mixed-effects models for analyses of the same data where the random factor is taken into account.

Let’s look at these data first so you have an idea of whether or not there are differences between them.

First the `data_1w_death`

(one-way ANOVA design) data from 5 independent *in vitro* experiments where death of host cells was measured (as percentage of total cells) after infection with three different strains of a pathogenic bacterium (factor Genotype).

`head(data_1w_death, n = 6) #see first 6 rows of data`

```
#> Experiment Genotype Death
#> 1 Exp_1 WT 25.012173
#> 2 Exp_1 KO_1 1.824973
#> 3 Exp_1 KO_2 14.294956
#> 4 Exp_2 WT 16.542609
#> 5 Exp_2 KO_1 2.131199
#> 6 Exp_2 KO_2 23.749439
```

A plot of these data shows that KO_1 induces very little death as compared to the other two bacterial strains.

```
plot_3d_scatterbox(data_1w_death, #data
Genotype, #fixed factor
Death, #Y variable
Experiment) #shapes
```

The `data_2w_Festing`

data from Festing et al dataset has two fixed factors (“Treatment” & “Strain”). For the present analysis, I will ignore the blocking factor (“Block”), and we use the same data set in the vignette on mixed effects models (`mixed_anova`

).

`head(data_2w_Festing, n = 8) #see first 8 rows`

```
#> Block Treatment Strain GST
#> 1 A Control NIH 444
#> 2 A Treated NIH 614
#> 3 A Control BALB/C 423
#> 4 A Treated BALB/C 625
#> 5 A Control A/J 408
#> 6 A Treated A/J 856
#> 7 A Control 129/Ola 447
#> 8 A Treated 129/Ola 719
```

The plot below of these data suggests that Treatment generally increases GST values irrespective of the mouse Strain (although the magnitude of increase appears to be different across the strains).

```
plot_4d_scatterbox(data_2w_Festing, #data
Strain, #factor 1
GST, #Y variable
Treatment, #factor 2
Block) #shapes
```

Let’s fit a linear model, use diagnostics and get ANOVA table.

In this vignette we use linear mixed-effects models; see the vignette on ordinary linear models for analyses of the same data where the random factor is ignored.

`mixed_model`

`mixed_model`

to save models for diagnostics and `posthoc_`

functions. The `mixed_model`

and `mixed_anova`

calls have identically arguments.

```
mod_1wdeath <- mixed_model(data_1w_death, #data table
"Death", #quantitative variable
"Genotype", #fixed factor
"Experiment") #random factor
#diagnostic residual plot
plot_qqmodel(mod_1wdeath)
```

```
mod_2w <- mixed_model(data_2w_Festing,
"GST",
c("Strain", "Treatment"),
"Block")
mod_2wtdeath <- mixed_model(data_2w_Tdeath,
"logit(PI/100)", #logit transformation in formula
c("Genotype", "Time"),
c("Experiment", "Subject"))
```

Check model residuals with the `plot_qqmodel`

function.

`plot_qqmodel(mod_2w) #saved above`

`plot_qqmodel(mod_2wtdeath)`

The `ggResidPanel`

package is great for model diagnostics. Check it out.

`mixed_anova`

ANOVA table with random factor taken into account.

```
mixed_anova(data_1w_death, #data table
"Death", #quantitative variable
"Genotype", #fixed factor
"Experiment") #random factor
```

```
#> Type II Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> Genotype 1420.2 710.1 2 8 42.659 5.401e-05 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The output indicates a huge impact of Genotype on cell death. Compare this result with the ordinary linear model.

Let’s analyse the Festing et al dataset this time including the random factor (analysis as ordinary linear model here).

```
mixed_anova(data_2w_Festing, #data table
"GST", #quantitative variable
c("Strain", "Treatment"), #fixed factors
"Block") #random factor
```

```
#> Type II Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> Strain 28613 9538 3 7 3.2252 0.09144 .
#> Treatment 227529 227529 1 7 76.9394 5.041e-05 ***
#> Strain:Treatment 49591 16530 3 7 5.5897 0.02832 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

The effect of Treatment looks more marked, and there may be a hint of interaction, so dig deeper!

Currently, `grafify`

implements one type of variable slopes model (`mixed_model_slopes`

and `mixed_anova_slopes`

). Refer to `lme4`

package to fit other kinds of random effects.

The two-way repeated-measures `data_2w_Tdeath`

has two fixed factors (Genotype & Time) and two random factors, Experiment and Subject. The Subject indicates repeated-measure over Time. The data table also has a column with numeric “Time2” column that could be used for graphs, remember to use the column with categorical “Time” column for analysis.

```
#> Experiment Time Subject Genotype PI Time2
#> 1 e1 t100 s1 WT 20.47120 100
#> 2 e2 t100 s2 WT 28.88967 100
#> 3 e3 t100 s3 WT 11.55061 100
#> 4 e4 t100 s4 WT 23.24516 100
#> 5 e5 t100 s5 WT 30.20904 100
```

`str(data_2w_Tdeath) #Time coded as categorical variable`

```
#> 'data.frame': 24 obs. of 6 variables:
#> $ Experiment: Factor w/ 6 levels "e1","e2","e3",..: 1 2 3 4 5 6 1 2 3 4 ...
#> $ Time : Factor w/ 2 levels "t100","t300": 1 1 1 1 1 1 1 1 1 1 ...
#> $ Subject : Factor w/ 12 levels "s1","s10","s11",..: 1 5 6 7 8 9 10 11 12 2 ...
#> $ Genotype : Factor w/ 2 levels "WT","KO": 1 1 1 1 1 1 2 2 2 2 ...
#> $ PI : num 20.5 28.9 11.6 23.2 30.2 ...
#> $ Time2 : int 100 100 100 100 100 100 100 100 100 100 ...
```

```
mixed_anova(data_2w_Tdeath, #data table
"logit(PI/100)", #logit transformation in formula
c("Genotype", "Time"), #fixed factors
c("Experiment", "Subject")) #random factors
```

```
#> Type II Analysis of Variance Table with Kenward-Roger's method
#> Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
#> Genotype 1.18171 1.18171 1 5 58.5419 0.0006071 ***
#> Time 1.84830 1.84830 1 10 91.5646 2.377e-06 ***
#> Genotype:Time 0.08288 0.08288 1 10 4.1059 0.0702377 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Results show a large effect of Genotype & Time, and questionable interaction effect.

The function `plot_lm_predict`

is handy to check how close the values predicted from the linear model are to observed (or measured) values that were used to fit the model. A good model will predict numbers close to observed/real data.

This function takes in a fitted model, and needs X and Y variables that **exactly** match those used to fit the model.

Let us plot the predictions of models fitted above.

```
#plots for Festing data set
plot_lm_predict(Model = mod_2w, #fitted model
xcol = Treatment, #one independent factor
ycol = GST, #dependent variable
ByFactor = Strain) #second factor
```