summary()
Function in R: How to Use (With 6 Examples)?
The summary()
is a base function in R which
is useful for getting the detailed statistical summary of the fitted model (ANOVA, regression, etc.), data frame, vector, matrix,
and factor.
For example, in the case of the fitted regression model, the summary()
function returns the model equation, regression
coefficients, residuals, F statistics, p value, and R-Squared.
The basic syntax for the summary()
function is,
summary(object)
In above syntax, the object
could be fitted model, data frame, data frame columns, matrix, or vector.
The following six example illustrates how to use a summary()
function to summarise the results for various objects.
1. Summary statistics for the regression model
summary()
function is a popular and widely used for summarising the statistical results obtained from the fitted
regression model.
The following example shows how to use the lm()
function to fit the linear regression model
and summary()
function to summarise the statistical results.
# load blood pressure example dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/reg/bp.csv")
# fit simple linear regression
model <- lm(BP ~ Age, data = df)
# get summary statistics
summary(model)
Call:
lm(formula = BP ~ Age, data = df)
Residuals:
Min 1Q Median 3Q Max
-6.7104 -2.9217 0.4276 2.3973 7.8586
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 44.4545 18.7277 2.374 0.02894 *
Age 1.4310 0.3849 3.718 0.00157 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.195 on 18 degrees of freedom
Multiple R-squared: 0.4344, Adjusted R-squared: 0.403
F-statistic: 13.82 on 1 and 18 DF, p-value: 0.001574
In the regression model, the summary()
function returns residuals, regression coefficients, performance metrics (R-Squared),
and statistical significance of regression such as F statistics and p value.
In addition to summary()
, you can also use summary.lm()
to get similar results.
2. Summary statistics for the ANOVA model
When you run ANOVA in R, the summary()
function is used for summarising the statistical results from the ANOVA model.
The following example shows how to use the aov()
function to fit the ANOVA model
and the summary()
function to summarise the statistical results.
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")
# fit one-way ANOVA
model <- aov(response ~ treatment, data = df)
# get summary statistics
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
treatment 3 3011 1003.6 17.49 2.64e-05 ***
Residuals 16 918 57.4
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
In ANOVA, the summary()
function returns an ANOVA table that contains the degree of freedom for treatment, residuals
(experimental error), and statistical significance of ANOVA such as F statistics and p value.
In addition to summary()
, you can also use summary.lm()
on the ANOVA model which returns detailed summary
statistics for each treatment group.
3. Summary statistics for data frame
The summary()
function could be used for getting descriptive statistics such as mean, median, and quartiles for all
or specific columns of a R data frame.
If you want descriptive statistics for additional parameters such as standard error (se), standard deviation (sd),
sample count, trimmed mean, etc., you should use describe()
function.
Get descriptive statistics for all columns,
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")
# get summary statistics
summary(df)
treatment response
Length:20 Min. :25.00
Class :character 1st Qu.:29.00
Mode :character Median :36.50
Mean :41.45
3rd Qu.:54.25
Max. :73.00
For a numeric variable, the summary()
function returns the statistical summary for minimum, first quartile
(25th percentile), median, mean, third quartile (75th percentile), and maximum value.
Now let’s check how to get descriptive statistics for a specific column,
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")
# get summary statistics for response variable
summary(df$response)
Min. 1st Qu. Median Mean 3rd Qu. Max.
25.00 29.00 36.50 41.45 54.25 73.00
4. Summary statistics for factor
The summary()
function could be used for getting the frequency of the character variable. The character variable
should be formatted as a factor.
Get a summary from a character variable,
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")
# get summary of character variable
summary(as.factor(df$treatment))
A B C D
5 5 5 5
For a factor, the summary()
function returns the frequency of each factor or group.
5. Summary statistics for vector
For a numerical vector, the summary()
function returns the descriptive statistical summary.
# create random numeric vector
x <- c(1, 0.5, 3, 4.5, 3, 2)
# summary
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.500 1.250 2.500 2.333 3.000 4.500
Note: The summary function drops NA values while providing a statistical summary on a numeric vector.
For a character vector, the summary()
function returns the frequency of the character. The character vector should be
formatted as a factor.
# create random character vector
x <- c("A", "B", "A", "C", "A", "B")
# summary
summary(as.factor(x))
A B C
3 2 1
6. Summary statistics for matrix
Similar to a data frame, the summary()
function returns a descriptive summary statistics for each column of the matrix.
If you convert a data frame to the matrix, the factor columns (characters) are converted to integer values.
# load dataset
df <- read.csv("https://reneshbedre.github.io/assets/posts/anova/anova.csv")
# convert to matrix
df_mat = data.matrix(df)
# get summary statistics
summary(df_mat)
treatment response
Min. :1.00 Min. :25.00
1st Qu.:1.75 1st Qu.:29.00
Median :2.50 Median :36.50
Mean :2.50 Mean :41.45
3rd Qu.:3.25 3rd Qu.:54.25
Max. :4.00 Max. :73.00
Enhance your skills with statistical courses using R
- Statistics with R Specialization
- Data Science: Foundations using R Specialization
- Data Analysis with R Specialization
- Understanding Clinical Research: Behind the Statistics
- Introduction to Statistics
- R Programming
- Getting Started with Rstudio
This work is licensed under a Creative Commons Attribution 4.0 International License
Some of the links on this page may be affiliate links, which means we may get an affiliate commission on a valid purchase. The retailer will pay the commission at no additional cost to you.