Practical session 1: Untangling Data with Tidyverse: Data wranggling in R
COVID-19 Vaccinations and Death in Malaysia
Task 1: Calculate the average age for deaths by state and find the state with the highest average age.
Steps:
First, we need to group the data by
state.Then, we can summarize the average age per state using
summarise.Use
arrangeto sort the average age in descending order to find the state with the highest average age.
Task 2: Determine the proportion of male to female deaths in each state.
Steps:
Using
mutate, create a new column calledgenderusingifelseto convert themalecolumn to ‘Male’ and ‘Female’.Group the data by
stateandgender.Summarise the count of each gender in each state.
Create a new column with the proportion of each gender in each state.
Task 3: Determine the total number of deaths by month and year.
Steps:
Convert the
datecolumn to Date type if it’s not already.Use
mutateto create new columnsyearandmonthusing theyearandmonthfunctions from thelubridatepackage.Group the data by
yearandmonth.Use
summariseto count the number of deaths.
Task 4: Determine if comorbidities are more common in Malaysian or non-Malaysian deaths.
Steps:
Create a new column
nationalitythat categorizesmalaysianinto ‘Malaysian’ and ‘Non-Malaysian’ usingmutateandifelse.Group by
nationality.Summarise the average comorbidity rate (
comorb).
Task 5: Find out the most common vaccine brand combination that was administered.
Steps:
Use
mutateto create a new columnbrands_combothat concatenatesbrand1,brand2, andbrand3.filterto keep only those rows wherebrands_combois not empty.Group by
brands_combo.Count the number of occurrences for each vaccine brand combination using
summarise.