Data format requirements for graphs and ANOVAs.
grafify
Eight data sets are included for practising using the package. To view these, type data(package = "grafify")
. Description of each of these can be obtained by typing ?<name of data set>
on the console. They are all in long format, which is described further below.
All plot_
functions require data in long format. Use pivot_longer
and pivot_wider
from the tidyr
package to interchange data table formats. Examples on how to do this are available here.
Analyses as linear models for ANOVAs (i.e., simple_model
, simple_anova
, mixed_model
, mixed_anova
, mixed_model_slopes
and mixed_anova_slopes
) also requires data to be supplied in long format.
In addition, a few more things need to be kept in mind: whether independent variables are categorical/nominal or numeric variables.
Confirm that categorical/discreet fixed and random factors are converted into factors by using as.factor()
(see here), and check the data frame with str()
. Failing to do this for columns that have numbers describing levels will lead to incorrect results (e.g., if a column describes “Experiments” as 1, 2, 3 and so on, R will think of these as numeric values when these should be analysed as categorical/discreet/nominal variables). Examples of numeric variables we might come across in biology may be time, temperature, bodyweight (mass), lengths etc.
ANOVA and model fitting functions do not check whether the variables are “factors”. This is because some may actually want to fit lines through numeric variables. Unless this is what you intended to do, convert fixed factors (also called discreet, categorical) to what R understands as ‘factors’ first.
It is the user’s responsibility to check data frame structure.
Here is an example of a data frame in grafify
.
#10 rows of data_1w_death table
head(data_1w_death, n = 10)
#> Experiment Genotype Death
#> 1 Exp_1 WT 25.012173
#> 2 Exp_1 KO_1 1.824973
#> 3 Exp_1 KO_2 14.294956
#> 4 Exp_2 WT 16.542609
#> 5 Exp_2 KO_1 2.131199
#> 6 Exp_2 KO_2 23.749439
#> 7 Exp_3 WT 31.125802
#> 8 Exp_3 KO_1 1.916670
#> 9 Exp_3 KO_2 21.998527
#> 10 Exp_4 WT 21.596348
#structure of the data frame
str(data_1w_death)
#> 'data.frame': 15 obs. of 3 variables:
#> $ Experiment: Factor w/ 5 levels "Exp_1","Exp_2",..: 1 1 1 2 2 2 3 3 3 4 ...
#> $ Genotype : Factor w/ 3 levels "WT","KO_1","KO_2": 1 2 3 1 2 3 1 2 3 1 ...
#> $ Death : num 25.01 1.82 14.29 16.54 2.13 ...
Note that Experiment and Genotype are columns with Factor
attribute, Death is num
(numeric).