Practical session 1: Untangling Data with Tidyverse: Data wranggling in R
COVID-19 Vaccinations and Death in Malaysia
Task 1: Calculate the average age for deaths by state and find the state with the highest average age.
Steps:
First, we need to group the data by
state
.Then, we can summarize the average age per state using
summarise
.Use
arrange
to sort the average age in descending order to find the state with the highest average age.
Task 2: Determine the proportion of male to female deaths in each state.
Steps:
Using
mutate
, create a new column calledgender
usingifelse
to convert themale
column to ‘Male’ and ‘Female’.Group the data by
state
andgender
.Summarise the count of each gender in each state.
Create a new column with the proportion of each gender in each state.
Task 3: Determine the total number of deaths by month and year.
Steps:
Convert the
date
column to Date type if it’s not already.Use
mutate
to create new columnsyear
andmonth
using theyear
andmonth
functions from thelubridate
package.Group the data by
year
andmonth
.Use
summarise
to count the number of deaths.
Task 4: Determine if comorbidities are more common in Malaysian or non-Malaysian deaths.
Steps:
Create a new column
nationality
that categorizesmalaysian
into ‘Malaysian’ and ‘Non-Malaysian’ usingmutate
andifelse
.Group by
nationality
.Summarise the average comorbidity rate (
comorb
).
Task 5: Find out the most common vaccine brand combination that was administered.
Steps:
Use
mutate
to create a new columnbrands_combo
that concatenatesbrand1
,brand2
, andbrand3
.filter
to keep only those rows wherebrands_combo
is not empty.Group by
brands_combo
.Count the number of occurrences for each vaccine brand combination using
summarise
.