How to Perform One-Way ANOVA in R (With Example Dataset)
The one-way ANOVA (Analysis of Variance) is used for determining statistical differences in more than two groups by comparing their group means.
The one-way ANOVA is also known as one-factor ANOVA as there is only one independent variable (factor or group variable) to analyze.
A one-way ANOVA tests the null hypothesis that group means are equal against the alternative hypothesis that group means are not equal (i.e. there is a significant difference between at least one group and the others).
You can use following code to perform one-way ANOVA in R:
# model
model <- aov(y ~ x, data = df)
# view ANOVA summary
summary(model)
Where,
Parameter | Description | |
---|---|---|
y |
Response variable (should be continuous variable) | |
x |
Group variable | |
df |
Data frame containing the group and response variable |
The following example illustrates how to use one-way ANOVA for analyzing the group differences.
How to Perform One-Way ANOVA in R
For example, a researcher wants to analyze whether plant height differs among plant genotypes. The researcher collects plant height data for four plant genotypes.
The researcher have following Null and Alternative hypotheses:
Null Hypothesis: The plant height is equal among plant genotypes i.e. the mean of plant height is equal
Alternative hypothesis: The plant height is not equal among plant genotypes i.e. the mean of plant height is
significantly different
Here, the alternative hypothesis is two-side as the plant height can be lesser or greater in one plant genotype than in another genotypes.
The following ANOVA code shows how to perform one-way ANOVA in R:
Load and view the dataset,
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/one_way_anova.csv")
# view five rows of data frame
head(df)
genotype height
1 A 5
2 A 6
3 A 7
4 A 8
5 A 8
6 B 12
Check descriptive statistics (mean and variance) for each plant genotype,
# load package
library(dplyr)
# get descriptive statistics
df %>% group_by(genotype) %>% summarise(mean = mean(height), var = var(height))
# A tibble: 4 × 3
genotype mean var
<fct> <dbl> <dbl>
1 A 6.8 1.7
2 B 13.6 2.3
3 C 7 3.5
4 D 7.2 1.7
From the descriptive statistics, we can see that plant height is highest for genotype B and lowest for genotype A. The variance is a roughly similar for all genotypes.
Now, we will perform a one-way ANOVA to check whether these differences in plant height are statistically significant.
Perform a one-way ANOVA and summarise the results using summary()
function,
# fit model
model <- aov(height ~ genotype, data = df)
# summary statistics
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
genotype 3 163.8 54.58 23.73 3.93e-06 ***
Residuals 16 36.8 2.30
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The one-way ANOVA analysis reports the following important statistics for interpretation,
Parameter | Value | |
---|---|---|
F | 23.73 | |
p value | 3.93e-06 | |
Degree of freedom | 3 and 16 |
According to the one-way ANOVA results, the p value is significant [F(3, 16) = 23.73, p < 0.05]. Hence, we reject the null hypothesis and conclude that plant height among genotypes is significantly different.
Relevant article
Enhance your skills with courses on Statistics and R
- Introduction to Statistics
- R Programming
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Getting Started with Rstudio
- Applied Data Science with R Specialization
- Statistical Analysis with R for Public Health Specialization
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.