COVID-19 Vaccinations and Death in Malaysia
Task 1: Univariate Linear Regression
Question: Perform a univariate linear regression to predict “age” using the “male” variable.
Steps:
Install and load the required packages: tidyverse and broom.
Filter the dataset to remove missing values in the “age” and “male” columns.
Fit a univariate linear regression model using the lm()
function.
Summarise the model using tidy()
from the broom package.
Solution:
# Step 1
#install.packages(c("tidyverse", "broom"))
library(tidyverse)
library(broom)
# Step 2
c19_df <- c19_df %>% filter(!is.na(age), !is.na(male))
# Step 3
model <- lm(age ~ male, data = c19_df)
# Step 4
tidy(model)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 63.6 0.132 482. 0
2 male -1.59 0.174 -9.14 6.51e-20
Task 2: Multivariate Linear Regression
Question: Perform a multivariate linear regression to predict “age” using the “male” and “malaysian” variables.
Steps:
Filter the dataset to remove missing values in the relevant columns.
Fit a multivariate linear regression model using the lm()
function.
Summarise the model using gtsummary()
.
Save the output as a document
Solution:
# Step 1
c19_df <- c19_df %>% filter(!is.na(age), !is.na(male), !is.na(malaysian))
# Step 2
model <- lm(age ~ male + malaysian, data = c19_df)
# Step 3
model %>%
tbl_regression() %>%
as_flex_table() %>%
flextable::save_as_docx(path = "regression.docx")
Task 3: Univariate Logistic Regression
Question: Perform a univariate logistic regression to predict “male” (binarize to 0 and 1) using the “age” variable.
Steps:
Filter the dataset to remove missing values in the relevant columns.
Fit a univariate logistic regression model using the glm()
function, specifying the family as “binomial”.
Summarise the model using tidy()
.
Solution:
# Step 1
c19_df <- c19_df %>% filter(!is.na(age), !is.na(male))
# Step 2
model <- glm(male ~ age, data = c19_df, family = "binomial")
# Step 3
tidy(model)
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.667 0.0414 16.1 1.74e-58
2 age -0.00581 0.000637 -9.12 7.45e-20
Task 4: Multivariate Logistic Regression
Question: Perform a multivariate logistic regression to predict “male” using the “age” and “malaysian” variables.
Steps:
Filter the dataset to remove missing values in the relevant columns.
Fit a multivariate logistic regression model using the glm()
function, specifying the family as “binomial”.
Summarise the model using gtsummary()
.
Save the output as a document
Solution:
# Step 1
c19_df <- c19_df %>% filter(!is.na(age), !is.na(male), !is.na(malaysian))
# Step 2
model <- glm(male ~ age + malaysian, data = c19_df, family = "binomial")
# Step 3
model %>%
tbl_regression(exponentiate = TRUE) %>%
as_flex_table() %>%
flextable::save_as_docx(path = "regression.docx")
Task 5: Model Evaluation
Question: Evaluate the logistic regression model from Task 4 using AUC-ROC.
Steps:
Install and load the pROC package (Note: Upon up the documentation to figure out the nuts and bolts.)
Use the predict() function to get the predicted probabilities from the logistic regression model.
Use the roc() function to compute the AUC-ROC.
Solution:
# Step 1
#install.packages("pROC")
library(pROC)
Type 'citation("pROC")' for a citation.
Attaching package: 'pROC'
The following objects are masked from 'package:stats':
cov, smooth, var
# Step 2
probabilities <- predict(model, type = "response")
# Step 3
roc_obj <- roc(c19_df$male, probabilities)
Setting levels: control = 0, case = 1
Setting direction: controls < cases
# Display AUC
auc(roc_obj)
Area under the curve: 0.5288