Data format

Data format requirements for graphs and ANOVAs.

Data sets in grafify

Eight data sets are included for practising using the package. To view these, type data(package = "grafify"). Description of each of these can be obtained by typing ?<name of data set> on the console. They are all in long format, which is described further below.

Data for graphs

All plot_ functions require data in long format. Use pivot_longer and pivot_wider from the tidyr package to interchange data table formats. Examples on how to do this are available here.

Data for ANOVAs

Analyses as linear models for ANOVAs (i.e., simple_model, simple_anova, mixed_model, mixed_anova, mixed_model_slopes and mixed_anova_slopes) also requires data to be supplied in long format.

In addition, a few more things need to be kept in mind: whether independent variables are categorical/nominal or numeric variables.

Confirm that categorical/discreet fixed and random factors are converted into factors by using as.factor() (see here), and check the data frame with str(). Failing to do this for columns that have numbers describing levels will lead to incorrect results (e.g., if a column describes “Experiments” as 1, 2, 3 and so on, R will think of these as numeric values when these should be analysed as categorical/discreet/nominal variables). Examples of numeric variables we might come across in biology may be time, temperature, bodyweight (mass), lengths etc.

ANOVA and model fitting functions do not check whether the variables are “factors”. This is because some may actually want to fit lines through numeric variables. Unless this is what you intended to do, convert fixed factors (also called discreet, categorical) to what R understands as ‘factors’ first.

It is the user’s responsibility to check data frame structure.


Here is an example of a data frame in grafify.

#10 rows of data_1w_death table
head(data_1w_death, n = 10)
#>    Experiment Genotype     Death
#> 1       Exp_1       WT 25.012173
#> 2       Exp_1     KO_1  1.824973
#> 3       Exp_1     KO_2 14.294956
#> 4       Exp_2       WT 16.542609
#> 5       Exp_2     KO_1  2.131199
#> 6       Exp_2     KO_2 23.749439
#> 7       Exp_3       WT 31.125802
#> 8       Exp_3     KO_1  1.916670
#> 9       Exp_3     KO_2 21.998527
#> 10      Exp_4       WT 21.596348
#structure of the data frame
#> 'data.frame':    15 obs. of  3 variables:
#>  $ Experiment: Factor w/ 5 levels "Exp_1","Exp_2",..: 1 1 1 2 2 2 3 3 3 4 ...
#>  $ Genotype  : Factor w/ 3 levels "WT","KO_1","KO_2": 1 2 3 1 2 3 1 2 3 1 ...
#>  $ Death     : num  25.01 1.82 14.29 16.54 2.13 ...

Note that Experiment and Genotype are columns with Factor attribute, Death is num (numeric).