Packages

For this post, we need to load the following library:

# install.packages("gtsummary")
library(gtsummary)

Default output for summary table

The gtsummary uses the tbl_summary() to generate the summary table and works well with the %>% symbol.

It automatically detects data type and use it to decides what type of statistics to compute. By default, it’s: - median, 1st and 3rd quartile for numeric columns - number of observations and proportion for categorical columns

library(gtsummary)

# create dataset
data("Titanic")
df = as.data.frame(Titanic)

# create the table
df %>%
  tbl_summary()

Characteristic	N = 32¹
Class
1st	8 (25%)
2nd	8 (25%)
3rd	8 (25%)
Crew	8 (25%)
Sex
Male	16 (50%)
Female	16 (50%)
Age
Child	16 (50%)
Adult	16 (50%)
Survived	16 (50%)
Freq	14 (1, 77)
¹ n (%); Median (IQR)

Add p-values and statistical details

If you want to add p-values to the table, you have to add by=variable_name in the tbl_summary() function. This happens because p-values are used to compare things between them.

The variable in the by argument will be used to split the dataset into multiple sub-samples (2 if it’s dichotomous, 3 if there are 3 distinct labels in the variable, etc). Those samples will be compared for each column in the dataset, and the test done depends on the type of data.

In this case, we add: - add_p() to create a new column for p-values - add_overall() to add a new column for descriptive statistics for the whole sample

library(gtsummary)

# create dataset
data("Titanic")
df = as.data.frame(Titanic)

# create the table
df %>%
  tbl_summary(by=Survived) %>%
  add_overall() %>%
  add_p() #%>%

Characteristic	Overall, N = 32¹	No, N = 16¹	Yes, N = 16¹	p-value²
Class				>0.9
1st	8 (25%)	4 (25%)	4 (25%)
2nd	8 (25%)	4 (25%)	4 (25%)
3rd	8 (25%)	4 (25%)	4 (25%)
Crew	8 (25%)	4 (25%)	4 (25%)
Sex				>0.9
Male	16 (50%)	8 (50%)	8 (50%)
Female	16 (50%)	8 (50%)	8 (50%)
Age				>0.9
Child	16 (50%)	8 (50%)	8 (50%)
Adult	16 (50%)	8 (50%)	8 (50%)
Freq	14 (1, 77)	9 (0, 96)	14 (10, 75)	0.6
¹ n (%); Median (IQR)
² Fisher’s exact test; Pearson’s Chi-squared test; Wilcoxon rank sum test

  #add_stat_label()

Add a column based on a custom function

Thanks to the add_stat() function, we can create new column based on our own functions.

Below, we define an anova function that returns the p-values of an ANOVA and pass it to the add_stat() function.

library(gtsummary)

# create dataset
data("iris")
df = as.data.frame(iris)

my_anova = function(data, variable, by, ...) {
  result = aov(as.formula(paste(variable, "~", by)), data = data)
  summary(result)[[1]]$'Pr(>F)'[1] # Extracting the p-value for the group effect
}

# create the table
df %>%
  tbl_summary(by=Species) %>%
  add_overall() %>%
  add_p() %>%
  add_stat(fns = everything() ~ my_anova) %>%
  modify_header(
    list(
      add_stat_1 ~ "**p-value**",
      all_stat_cols() ~ "**{level}**"
    )
  ) %>%
  modify_footnote(
    add_stat_1 ~ "ANOVA")

Characteristic	Overall¹	setosa¹	versicolor¹	virginica¹	p-value²	p-value³
Sepal.Length	5.80 (5.10, 6.40)	5.00 (4.80, 5.20)	5.90 (5.60, 6.30)	6.50 (6.23, 6.90)	<0.001	0.000
Sepal.Width	3.00 (2.80, 3.30)	3.40 (3.20, 3.68)	2.80 (2.53, 3.00)	3.00 (2.80, 3.18)	<0.001	0.000
Petal.Length	4.35 (1.60, 5.10)	1.50 (1.40, 1.58)	4.35 (4.00, 4.60)	5.55 (5.10, 5.88)	<0.001	0.000
Petal.Width	1.30 (0.30, 1.80)	0.20 (0.20, 0.30)	1.30 (1.20, 1.50)	2.00 (1.80, 2.30)	<0.001	0.000
¹ Median (IQR)
² Kruskal-Wallis rank sum test
³ ANOVA

Conclusion

This post explained how to create summary table using the gtsummary library. For more of this package, see the dedicated section or the table section.

Related chart types

Line plot

Area

Stacked area

Streamchart

Time Series

Contact

This document is a work by Yan Holtz. Any feedback is highly encouraged. You can fill an issue on Github, drop me a message on Twitter, or send an email pasting yan.holtz.data with gmail.com.

Github Twitter