biohacks2021.github.io

View on GitHub

Predicitive Factors for COVID19 Death in Long Term Care Facilities in Ontario

I chose to use Ontario’s public data on COVID19 outbreaks in long term care facilities to see if facility factors predicted death outcome. My data is from various public ontario records that can be found with the rest of my code.

I used a linear model and found 5 predictors of mortality through a step-wise regression model. My results were significant, with R-squared of 0.13 and a p-value < 0.005. While this may not seem like a lot of predictive power, I would argue that 13% is worth investigating given that the context is predicted mortality in a pandemic. My significant variables are:

  1. Number of Beds/Residents
  2. Number of Non-compliance issues found in targeted inspections
  3. Number of Councils for Resident Advocacy (resident and family councils)
  4. Non-Profit Status
  5. For-Profit Status

Results

Plot of Resident Death as explained by number of beds, non-compliance in targeted inspections, and number of resident advocacy councils

Interestingly, one of the variables with the most explanatory power is the presence of Family councils and Resident councils which advocate for the residents of the long term care facilities. Advocacy groups had a negative coefficient meaning that facilities with more advocacy groups had less deaths. More non-compliance issues in a targeted investigation predicted more deaths as well. This is unsurprising but the interesting result is that non-compliance issues in annual inspections had no predictive power. This implies targeted investigations are needed to ensure that facilities are safe. Unsurprisingly the number of residents was also a significant factor which makes sense for a highly contagious disease.

Plot of Resident Death as explained by profit motive

The profit motive had significant explanatory power, the least deaths occured in munipical publically funded facilities, followed by non-profit facilities and for-profit facilities. While it would be impossible to say if this is causative it certaintly warrant further investigation.

Regression Model coefficients

This plot shows the predictive power of each of the significant dependant variables.

Interpreting the results

I wanted to find out if COVID19 deaths could be predicted based on the type of long term care facility. Finding out if certain types of institutions coped better with the pandemic could inform future policy on long term care home regulation.

My results are very preliminary and require further investigation and data collection. For example, one large missing factor is the average age and sex for each LTC facility. However, I think that given further information this model could be valuable for actionable plans to improve the care in long-term facilities.

Methodology

My methods were mainly data cleaning and web-scraping to integrate different data sources. Please see my github for the rest of my code. I did the following

  1. Webscrape government website to get data on complicance in inspections
  2. Webscrape government website to get data on profit status
  3. Integrate and clean data (only 29 out of 514 long term care homes were excluded due to missing data)
  4. Create linear model