top of page
header_influenza.jpg

PREPARING FOR INFLUENZA SEASON

CONTEXT

The United States has an influenza season where more people than usual suffer from the flu. Some people, particularly those in vulnerable populations, develop serious complications and end up in the hospital. Hospitals and clinics need additional staff to adequately treat these extra patients. The medical staffing agency provides this temporary staff.

GOAL

“Preparing for Influenza Season” is a data analysis project whose goal is to provide medical staffing agencies with information to support a staffing plan, detailing what data can help inform the timing and spatial distribution of medical personnel throughout the United States.

Project Duration:  1 month

Production Date:  2022

 

Tools:

 

PROJECT INFORMATION

TECHNIQUES

• Business Understanding

• Designing a Data Research Project

• Data Profiling & Integrity Checks

• Data Quality Measures (Cleaning)

• Data Integration & Transformation

• Statistical Analysis

• Statistical Hypothesis Testing

• Data Visualization & Storytelling (Tableau)

PROCESS

PROJECT MANAGEMENT PLAN

I started the project by clearly defining its objectives and requirements. For that, I reviewed the available data and the initial project information and formulated a list of clarifying and funneling questions. As previously outlined, the primary goal of the analysis project is to determine when and where the most staff is needed to counteract serious influenza complications. To accomplish this objective, I needed to identify the population groups most susceptible to experiencing these complications by analyzing the historical data on influenza deaths.

The collected questions helped to create the project management plan in which some questions were addressed and initial hypotheses formulated. The plan also outlined the project schedule and milestones, including a comprehensive description of each, and clearly defined the project deliverables. In addition, it provided information on the methods for effectively communicating with stakeholders.

DATA PROFILING & INTEGRITY

After formulating the objectives of the project, I examined the provided data sets on their sourcing, collection method, contents, limitations, and overall relevance for the project. The goal was to early on identify the data with the most importance for the project and to examine its reliability, including the potential for any data biases.
Following this process, I used Excel to do a set of data consistency checks and subsequently took initial data cleaning steps after which I wrote summary statistics and profiles for all data set variables.

DATA QUALITY MEASURES

My next step was to examine the relevant data sets on their completeness, uniqueness, and quality and correct any issues that I identified during these tests (data cleaning). To detect any missing data, I used Excel's Pivot Table function for each variable and calculated the percentage of each value missing or not stated. This helped me to decide how to handle each instance. As I worked with medical data there were multiple occurrences where values were not stated or suppressed for privacy reasons.

To determine the uniqueness of all relevant data sets, I again utilized Excel's pivot table function to identify any duplicates and subsequently removed them while using a unique key (data grain). Afterward, I used data mapping to transform the now clean data sets into one combined data set.

STATISTICAL ANALYSIS

An initial investigation into the data revealed that almost 50.000 patients died of influenza each year and that 9 out of 10 fatalities are people over the retirement age (65). Therefore the initial assumption was that senior citizens are the most vulnerable to influenza. To support this assumption, I calculated a series of normalizations that allowed me to compare the mortality rates of all age groups.

While the mortality rates show an increase from from age 45 and older, there is a significant growth noticable in the age groups 65 years and above. I used these results to formulate a research hypothesis I would be able to prove or disprove.

"If the patient is elderly (over 65 years old), then the risk of mortality is higher."

Research Hypothesis

The table below shows measures of central tendency for the sum of all ILI (Influenza-like-Illnesses) deaths in the age group of 65+ and the population sum of the same age group.

To measure the strength of their relationship, I compared the sum of ILI deaths of patients age 65+ to the Sum of population age 65 and above which resulted in a correlation coefficient of 0.8, which is a strong relationship. As the population of people 65 of age or older increases, the number of ILI deaths in the same age group rises as well.

The table below shows measures of central tendency for the mortality rate of all vulnerable age groups (% all Age Groups) and the mortality rate of the total population (% of Total population).

Comparing the mortality rate of vulnerable age groups with the mortality rate of the total U.S. population resulted in a correlation coefficient of 0.2. Which is a weak relationship. The weak relationship between the two variables shows that they change somewhat independently from each other. A change in the age group mortality does not mean a significant change in the mortality rate of the total population but a change in values will still be accounted for in the overall mortality rate.

STATISTICAL HYPOTHESIS TESTING

On the basis of the statistical analysis, I created statistical hypotheses:

"The mortality rate of people that are 65 years or older is equal to or less
than the mortality rate of people under 65 years of age."

Null Hypothesis

"The mortality rate of people that are 65 years or older is higher than
the mortality rate of people under 65 years of age."

Alternative Hypothesis

To disprove the null hypothesis I compared the two mortality rates in a one-tailed t-test. The results showed that the p-value (2,23E-11) is lower than the significance level (alpha=0,05). Which concluded that the probability of the null hypothesis being significant or happening due to chance is less than 1%. The data provided enough evidence to reject the null hypothesis and to accept the alternative hypothesis as statistically significant.

In conclusion, we can say with 95% confidence, that the two groups are significantly different and that the mortality rate of people 65 years of age and older is significantly higher than the mortality rate of people below 65 of age.

I finished this part of the project with a written interim report.

DATA VISUALIZATION

With the hypothesis proven statistically significant, and the most vulnerable groups identified, I started with visualizing the data and insights using Tableau. The goal was to make communicating results to the stakeholders easier and to further visually analyze the data.

Following the visualizations of influenza deaths by state, I displayed the data on a map to better analyze the spatial distribution and identify high-risk U.S. states. For that, I looked at the states with the highest population size of the most vulnerable age group (65+) and the historical number of influenza deaths.

Based on the visualizations I was able to easily identify high-risk states and communicate them to other stakeholders who were part of the project.

For the final deliverable, I gathered all relevant insights and visualizations and created a project presentation detailing the process and outcome and any further recommendations for the future. Feel free to look at the Tableau storyboard or the video presentation below.

DELIVERABLES

SUMMARY

I successfully reached the goal of providing the medical staffing agency with the necessary information they needed. The process involved analyzing and interpreting large amounts of data, including demographic information and various other factors. I used my expertise in data analysis and critical thinking skills to identify trends and patterns that were relevant to the agency's requirements. I also communicated effectively with stakeholders to ensure that the information I provided was accurate and met their expectations.

 

For the future, it is important to note that collecting more up-to-date data is crucial in the future to ensure that the results are relevant and accurate. This will help provide a clearer understanding of the needs and requirements of the medical staffing agency.

bottom of page