Practical Statistics 3: Powering up! Descriptive and Inferential Statistics

COVID-19 Vaccinations and Death in Malaysia

Task 1: Descriptive statistics using `tidyverse`

Question: Compute the summary statistics (count, mean, standard deviation, minimum, and maximum) of age using tidyverse functions.

Steps:

Install and load the tidyverse package.
Filter the dataset to remove missing values in the “age” column (Note there are no missing values in the dataset- the task is simply meant to simulate the code that would be required if there were).
Use the summary functions from dplyr to compute the required summary statistics. In this case- count, mean, standard deviation, minimum, and maximum

Solution:

# Step 1
#install.packages("tidyverse")
library(tidyverse)

# Step 2 & 3
summary_age <- c19_df %>% filter(!is.na(age)) %>% 
  summarise(
  count = n(),
  mean = mean(age),
  sd = sd(age),
  min = min(age),
  max = max(age)
)
summary_age

  count     mean       sd min max
1 37152 62.65464 16.58926   0 130

Task 2: Descriptive statistics using `gtsummary`

Question: Create a descriptive statistics table for age, male, bid, and malaysian variables using gtsummary.

Steps:

Install and load the gtsummary package.
Create a subset of the data with the selected variables (Note: Select any five variables).
Use the tbl_summary() function to compute and display the descriptive statistics.
Stratify by any other selected variable.

Solution:

# Step 1
#install.packages("gtsummary")
library(gtsummary)

# Step 2, 3 & 4
df_subset <- c19_df %>% 
  select(age, male, bid, malaysian) %>% 
  tbl_summary(by = malaysian)

Task 3: Inferential statistics using `rstatix`

Question: Test if there is a significant difference in age between males and females using the t-test.

Steps:

Install and load the rstatix package.
Filter the dataset to remove missing values in the “age” and “male” columns.
Recode the “male” variable to factor.
Conduct a t-test to compare the means.

Solution:

# Step 1
#install.packages("rstatix")
library(rstatix)

# Step 2
c19_df <- c19_df %>% filter(!is.na(age), !is.na(male))

# Step 3
c19_df$male <- factor(c19_df$male, levels = c(0, 1), labels = c("Female", "Male"))

# Step 4
c19_df %>% t_test(age ~ male)

# A tibble: 1 × 8
  .y.   group1 group2    n1    n2 statistic     df        p
* <chr> <chr>  <chr>  <int> <int>     <dbl>  <dbl>    <dbl>
1 age   Female Male   15783 21369      9.05 32644. 1.55e-19

Task 4: Inferential statistics using `gtsummary`

Question: Test if there is a significant difference in age between Malaysians and non-Malaysians using the t-test, and present the results in a table using gtsummary.

Steps:

Recode the “malaysian” variable to factor (Tip: Use the factor function).
Use the tbl_summary() function to present the results.

Solution:

# Step 1
c19_df$malaysian <- factor(c19_df$malaysian, levels = c(0, 1), labels = c("Non-Malaysian", "Malaysian"))

# Step 2
t_test_result <- c19_df %>% 
  select(age, malaysian) %>%                 # keep variables of interest
  tbl_summary(                               # produce summary table
    statistic = age ~ "{mean} ({sd})",       # specify what statistics to show
    by = malaysian) %>%                      # specify the grouping variable
  add_p(age ~ "t.test") 
t_test_result

Characteristic	Non-Malaysian, N = 4,034¹	Malaysian, N = 33,118¹	p-value²
age	49 (14)	64 (16)	<0.001
¹ Mean (SD)
² Welch Two Sample t-test

Task 5: Correlations using `corrr`

Question: Compute the correlation between age, male, bid, and malaysian variables, and represent it in a correlation plot (Note: The selection of categorical variables is by design- just to practice the selection and presentation)

Steps:

Install and load the corrr package.
Create a subset of the data with the selected variables.
Compute the correlation matrix (Note: Try ?network_plot and see how this can be used)

# Step 1
#install.packages("corrr")
library(corrr)

# Step 2
df_subset <- c19_df %>% select(age, male, bid, malaysian)

# Step 3
correlation_matrix <- df_subset %>% correlate()

Non-numeric variables removed from input: `male`, and `malaysian`
Correlation computed with
• Method: 'pearson'
• Missing treated using: 'pairwise.complete.obs'

# Step 4
correlation_matrix %>% network_plot()

This would be the outcome if anything was highly correlated in our data.

COVID-19 Vaccinations and Death in Malaysia

Task 1: Descriptive statistics using tidyverse

Task 2: Descriptive statistics using gtsummary

Task 3: Inferential statistics using rstatix

Task 4: Inferential statistics using gtsummary

Task 5: Correlations using corrr

Task 1: Descriptive statistics using `tidyverse`

Task 2: Descriptive statistics using `gtsummary`

Task 3: Inferential statistics using `rstatix`

Task 4: Inferential statistics using `gtsummary`

Task 5: Correlations using `corrr`