In Phase 1 of the American Heart Association COVID-19 Data Challenge our team reported results from our report entitled, “Effect of Public Health Policies on New Confirmed Cases for Coronavirus Disease 2019 in South Korea: Lessons for the World.”
In Phase 2, we introduce 2 separate analyses that build on this work by evaluating the impact of the COVID-19 pandemic on key population groups in the United States. In Part I of Phase 2, we examine how policies related to university and college reopening affected new confirmed COVID-19 cases in surrounding communities leveraging several publicly available data sources. In Part II of Phase 2, we leverage IBM Marketscan Data that were made available to teams as part of the AHA COVID-19 Data Challenge to estimate the number of individuals at high-risk for infection within the general population. Addressing these questions together will allow policymakers and public health officials to consider both the immediate and long-term impact of the COVID-19 pandemic as the U.S. and world brace for additional waves of infection during the upcoming winter.
Main findings:
The number of university and college students returning to campus had a positive association with new confirmed COVID-19 cases within the local county region where the institution resided. In comparison to holding class in-person, reopening colleges online or in a hybrid mode was associated with a lower number of new confirmed COVID-19 cases.
The number of individuals considered at high-risk for COVID-19 infection due to age, medical conditions and an immunocompromised state (due to solid organ transplant or drug use) varies considerably across the United States. These findings may be valuable for future work that evaluates how policies for reopening businesses and other social activities may need to be tailored at the state-level if COVID-19 cases continue to grow over time.
The new academic year for universities and colleges in the U.S. began in the midst of the COVID-19 pandemic in late August and early September. During March and April, most schools moved their class fully online and locked down the campus. For the 2020 fall semester, many schools opted to reopen fully or partially given a desire to optimize learning environments. However, other schools chose to institute policies supporting an online or hybrid mode of learning given concerns for student safety. Though all schools have tried to take many public health actions to ensure reopening occurs safely, numerous reports in the lay media suggest universities and colleges experienced outbreaks of COVID-19 cases after students returned. As indirect evidence, the overall total number of daily new confirmed cases across the U.S. shows an increasing trend starting from early September in Figure 1 that coincides with the time of school reopenings (both universities, colleges and elementary/secondary schools).
Although anecdotal observations about COVID-19 outbreaks after university and reopening within a campus have been communicated extensively by newspapers and other outlets, the precise effects of students returning to campus and university and college reopening policies on the spread of coronavirus in their surrounding communities are still unclear. In this study, we investigate these concerns more specifically focusing on two Aims:
First, we look at all counties in the U.S. and study the association between the spread of coronavirus since September 1st with the number of students returning to campus in each county. In this aim of the study, we do not distinguish between different reopening policies of universities and colleges (except for excluding those with an online mode of reopening only) and primarily focus on the proportion of population enrollment in universities and colleges in the local county region to understand how the number of students returning to campus may be related to new confirmed COVID-19 cases more generally in their community.
In Aim 2, we narrow down our scope to the U.S. counties with universities and colleges to specifically study the association of different reopening policies on new confirmed COVID-19 cases. Using publicly available information, we divide different policies into three categories: in-person, hybrid, and online. We explore the extent to which virtual classes through hybrid or online reopening – rather than in-person classes – was associated with new confirmed COVID-19 cases.
We aggregated multiple publicly available datasets for these analyses. First, we used county-level daily new confirmed cases during the period of August 1st to October 22nd, 2020 (the specific dates of our study time period) from the COVID-19 tracking project by 1Point3Acres which was last updated on October 23rd, 2020. Second, we also obtained state-level cumulative testing rates from the CDC COVID Data Tracker. Third, we collected various university and college reopening policies and data on their enrollment from 2958 institutions across 1253 counties utilizing resources from the Chronicle of Higher Education which was last updated on October 1st, 2020. (Note: when we use the term university in isolation below we refer to both universities and colleges of higher education after secondary school.) These data were originally provided by the College Crisis Initiative at Davidson College. Fourth, we obtained county-level demographics data and land area from the United States Census Bureau website and socioeconomic data from the Economic Research Service in United States Department of Agriculture. In particular, we collected the county-level resident population estimates by age and race in 2019, the county-level median household income in 2018, and the county land area in square miles in 2010.
For both Aim 1 and Aim 2, our main outcome variable was the mean number of daily new confirmed cases per 10,000 county population from September 1st to October 22nd at the county level. Note that we chose to use the data starting from September 1st to capture the trend of confirmed cases after reopening, because the majority of these reopened near the late August or the early September. Moreover, we considered the mean number of daily new confirmed cases over a relatively long period (until the end date of analysis on October 21, 2020), rather than shorter intervals, to smooth out the weekly periodic pattern and the lag in impact of various reopening policies. In addition, we standardized the county-level average number of daily new confirmed cases by the county population as the total population varies significantly over different counties and it may affect the absolute number of confirmed cases for each county.
We use different exposure variables for Aim 1 and Aim 2 as their goals are different. For Aim 1, we focused on the number of students enrolled through in-person or hybrid classes trying to examine the effect of reopening between counties with and without universities. For Aim 2, we focused on specific reopening policies in only counties with universities. The original dataset for reopening policies mainly categorized five policies including “fully in person”, “primarily in person”, “hybrid”, “primarily online”, and “fully online”. For ease of interpretation, we further merge “fully in person” and “primarily in person” as “in person” policy, and “fully online” and “primarily online” as “online” policy, which resulted in three policy categories: “in person”, “online”, and “hybrid” in our analysis. Below we describe the exposure variables used for two aims in detail.
Aim 1: To study the impact of universities’ reopening, we used an exposure variable that captured the number of university students returning to their campus within each county. Given that the enrollments and policies of universities in each county may differ when more than one university is present in a county, we defined the exposure variable for this county as follows. We first aggregated all universities in a county and summed up the enrollment of students under the in-person or hybrid policies (where students would likely return to campus). Then the exposure variable was obtained as the ratio of this sum and the total county population (ENROLLP). Note that we used the total enrollment of universities under the above two policies as a reasonable proxy of the number of university students returning to the campus. And, as there is a large variance among universities’ enrollment, the proposed exposure variable better characterizes the population flow, in comparison to alternatives such as the number of universities with different policies. A higher value of the exposure variable means that more students stayed in physical buildings and were involved in onsite campus activities per county population. The exposure variable equals zero if there is no university in a region (or all universities in a region took online reopening policy.
Aim 2: To compare different reopening policies across universities, we focused on a subset of 1069 counties with universities. The main exposure variables were the proportion of enrollment for three policies within each county (ONLINE%, HYBRID%, INPERSON%). For example, for each county, the “in person” variable was the ratio of the total enrollment of all universities taking in-person policy in this county to the total county enrollment. The sum of three policy exposure variables equals to one. We were interested in whether the proportion of hybrid and online enrollment would be negatively associated with the COVID-19 confirmed cases when taking in-person enrollment as the baseline. We show an example of this in Figure 5 for Washtenaw County where the University of Michigan is one of several schools in the region.
We considered the following control variables in models for both Aim 1 and Aim 2. We first included the mean number of daily new confirmed COVID-19 cases per 10,000 county population in August (AUGP). Examining new confirmed COVID-19 cases in August before reopening occurred would allow us to account for baseline differences in the severity of coronavirus infections across counties. We also controlled for five county-level demographic and economic characteristics that could potentially be associated with new confirmed COVID-19 cases including: 1) the proportion of population aged over 65 years (AGE65), 2) the proportion of population aged below 20 years (AGE20), 3) the proportion of population of racial minorities (non-White) (RACEMINO), 4) the median household income as of 2018 (MEDHHINC), and 5) COVID-19 testing rates at the state level (TR). Note that for the variable testing rates, we only had it available at the state level and used this as a proxy for county-level resources for testing.
The analysis was done at the county level. The dataset initially provided county-level daily confirmed cases for 3086 counties (out of 3141 counties in the U.S.). To focus on in-person, online, and hybrid policies of interest, we excluded 163 counties that contained any university with “other” or “undetermined” policy. We observed a handful of “negative” daily confirmed cases, which were used to correct historical mistakes in the county-level data (i.e., a case was re-categorized the following day from being positive to negative). Given that such data from counties with large numbers of corrections are likely to be unstable and may even indicate limited surveillance systems, we excluded 30 (~1%) counties that contained at least one day with more than 5 corrected cases, which resulted in 2893 counties in our analysis.
We use linear models to estimate the effects of university reopening and different reopening policies. For Aim 1, we use the model:
\[ \begin{aligned} \text{Outcome } = & \text{ Intercept} + \text{ENROLLP} * \beta_1 + \text{AUGP} * \beta_2 + \text{AGE20} * \beta_3 + \text{AGE65} * \beta_4 + \text{RACEMINO} * \beta_5 \\ & + \text{ MEDHHINC} * \beta_6 + \text{TR} * \beta_7 + \epsilon. \end{aligned} \] For Aim 2, we use the model: \[ \begin{aligned} \text{Outcome } = & \text{ Intercept} +\text{ONLINE}\% * \beta_1 +\text{HYBRID}\% * \beta_2 + \text{AUGP} * \beta_3 + \text{AGE20} * \beta_4 + \text{AGE65} * \beta_5 \\ &+ \text{RACEMINO} * \beta_6 + \text{ MEDHHINC} * \beta_7 + \text{TR} * \beta_8 + \epsilon. \end{aligned} \]
Note that for Aim 2, the enrollment proportion of three policies sum up to 1, so we drop the covariate in-person enrollment proportion to avoid the issue of collinearity among covariates. We used a p-value of less than 0.05 to indicate statistically significant evidence of an association between the covariate and the outcome variable.
Further, we also conducted five sensitivity analyses.
First, for Aim 1, we conducted another analysis to compare the early-stage (within a month) and late-stage (after a month) impact of university reopening. Specifically, for each county, we considered two periods of time, one in September and the other in October. Thus, each county has two outcome measurements, corresponding to the average number of daily new confirmed cases per 10,000 county population in September and in October respectively. We also constructed an indicator variables indicating whether the period under consideration is October (I(Oct)), and added itself and its interactions with the mean daily new confirmed proportion in August and the enrollment proportion into the linear regression model to account for the variation over time. The model is given by \[ \begin{aligned} \text{Outcome } = & \text{ Intercept} + I(\text{Oct}) * \beta_1 + I(\text{Sep}) * \text{ENROLLP} * \beta_2 \\ & + I(\text{Oct}) * \text{ENROLLP} * \beta_3 + \text{AUGP} * \beta_4 + I(\text{Oct}) * \text{AUGP} * \beta_5 \\ &+\text{AGE20} * \beta_6 + \text{AGE65} * \beta_7 + \text{RACEMINO} * \beta_8 + \text{ MEDHHINC} * \beta_9 + \text{TR} * \beta_{10} + \epsilon, \end{aligned} \] where \(I(\cdot)\) is the indicator variable for a given month.
Second, given students may still be on campus even if instruction was online only, we replace the covariate of enrollment proportion as the enrollment proportion of all types of enrollment and re-estimated the model for Aim 1.
Third, for Aim 1, we changed the exposure variable - the in-person and hybrid enrollment proportion - into a binary variable. This variable is 1 if the in-person and hybrid enrollment proportion is greater than zero, and is 0 otherwise. The goal of this sensitivity analysis was to determine if a “dose-response” effect was noted based on the proportion of in-person and hybrid enrollment (rather than its mere presence in a county).
Fourth, we added county-level population density as another control variable and re-estimated the model. We added population density as it could affect the frequency of face-to-face interactions among people within the county, and therefore, affect the spread of COVID-19.
Lastly, because the impact of universities’ reopening to counties with large populations can be relatively minor, we further removed 87 counties with the top 5% total county population and re-estimated the model based on this cohort.
Our analysis was conducted using R statistical software version 3.6.0 and 3.6.1.
The final cohort includes 2893 U.S. counties. The result of the fitted model is shown in Table 1. Counties with higher proportions of in-person or hybrid university enrollment are associated with higher confirmed case rates since September (\(\beta\) = 2.01, p-value < 0.001). That means, for every 10% rise in the ratio of in-person or hybrid university enrollment to the county population, the mean daily new confirmed cases would increase by about 0.2 persons per 10,000 county population (e.g., 7.4 cases per day in a county the size of Washtenaw County where the University of Michigan resides).
Regarding control variables, the average confirmed case rate in August is positively correlated with the confirmed case rate after September (\(\beta\) = 0.20, p-value < 0.001). This was not unexpected as the confirmed case rate is not likely to change drastically within a short period of time. The proportion of young people under the age of 20 and the proportion of older adults 65 years or older are both positively associated with the outcome (p-values < 0.001). This could be due to the fact that older adults are at higher risk of getting COVID-19, and young people would likely be going back to school after September, bringing higher chances of COVID-19 spread. The median household income, which can in general represent the local economic status, is negatively correlated with the outcome suggesting lower numbers of new confirmed COVID-19 cases in wealthier regions. The regional testing rate is positively associated with the outcome (p-value < 0.0001). This is expected, since the more cases get tested, the more confirmed cases are likely to be reported. Finally, we have found that the proportion of population of racial minorities in a county is negatively correlated with the outcome (\(\beta\) = -0.96, p-value <0.001). This result was robust using other control variables that represent racial minorities (e.g., African American proportion in a population). We suspect this is due to our evaluation of new confirmed cases late in the pandemic (i.e., time after September), instead of the whole pandemic period where rates of infection have been higher in U.S. counties with higher proportions of minority populations.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -2.507 | 0.495 | <0.001 |
AUGP | 0.204 | 0.017 | <0.001 |
AGE20 | 14.148 | 1.134 | <0.001 |
AGE65 | 4.373 | 0.954 | <0.001 |
ENROLLP | 2.006 | 0.508 | <0.001 |
RACEMINO | -0.955 | 0.199 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.810 | 0.201 | <0.001 |
We take a subset from the cohort analyzed in Aim 1 but only keep U.S. counties with universities and colleges. This sample included 1069 counties. The result is summarized in Table 2. The online enrollment proportion is negatively associated with the outcome (\(\beta\) = -0.33, p-value < 0.001). The result could be interpreted in this way: fixing the hybrid enrollment proportion, if we move 10% of the enrolled university students in a county from in-person to online mode, the mean number of daily new confirmed cases per 10,000 county population would have decreased by about 0.033 people during this study period (i.e., reduce the number of new confirmed COVID-19 cases by approximately 1.2 per day in a county the size of Washtenaw County). We also found the hybrid policy enrollment proportion is negatively correlated with the outcome (\(\beta\) = -0.27, p-value = 0.012). This suggests that moving 10% of the enrollment students from in-person mode to online mode would reduce mean number of daily new confirmed cases per 10,000 county population by 0.027 people (i.e., reduce the number of new confirmed COVID-19 cases by approximately 1.0 per day in a county the size of Washtenaw County). The interpretation of other control variables are similar as in our analysis in Aim 1, and we omit the discussion here.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -0.729 | 0.743 | 0.327 |
AUGP | 0.425 | 0.036 | <0.001 |
AGE20 | 12.109 | 1.824 | <0.001 |
AGE65 | -1.511 | 1.499 | 0.314 |
ONLINE% | -0.329 | 0.095 | <0.001 |
HYBRID% | -0.272 | 0.108 | 0.012 |
RACEMINO | -1.454 | 0.264 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.581 | 0.267 | <0.001 |
For Aim 1, we found evidence of an early-stage effect with enrollment proportion noted to be a significant variable in affecting the number of new confirmed COVID-19 cases in September, but with its association becoming insignificant in October. This suggests that the effect of school reopening may depend on the time, and the effect is more important in the first few weeks after reopening. We also observed such a time-varying effect in Figure 2, where there was an initial case surge in September with a later “catch-up” phenomenon in October. Secondly, we found that including online enrollment to the exposure variable in Aim 1 makes the enrollment proportion a more significant variable. It is likely that students who enroll online might still be around campus or local areas and bring risks of coronavirus spread. Thirdly, converting the continuous value of enrollment proportion into a binary one leads to weaker association between the outcome and university enrollment for Aim 1. This implies the necessity of measuring the precise number of in-person and hybrid enrollment as a continuous variable, rather than just a simple indicator, to reflect the extent of university students returning to the campus. Fourth, the additional control variable - the population density - was not found to be significantly associated with the outcomes. Finally, our sensitivity analysis results suggest that our analysis is not sensitive to including U.S. counties with large populations. The details of these model fitting results are summarized in Appendix A1.
In this study, we examined the effect of reopening in universities and colleges on the spread of COVID-19 in their surrounding communities during the fall. Our findings could be summarized in two folds. First, we found that students returning to campus was associated with an increase in new confirmed COVID-19 cases in the counties in which the universities and colleges reside. Second, we assessed the effects of three different reopening policies and found that reopening online or partially online was associated with slower spread of the virus, in comparison to in-person reopening.
Our findings could provide principled guidance on the upcoming winter/spring 2021 semesters. For example, due to the impact of enrollment to its neighboring area, regions with universities and colleges should expect a spike of the confirmed cases in the first few weeks after students return to campus. Medical supplies such as masks and medication should be prepared in advance as well as enforcement of strict social distancing. However, our findings also suggest that these effects wane over time when compared with other U.S. counties without colleges or universities. In addition, for those areas that do choose to reopen, we found evidence to support that an online or hybrid mode with virtual classes may be the preferable option at mitigating the number of new confirmed COVID-19 cases.
The COVID-19 pandemic has been found to affect key populations more than others. These “high-risk” populations include older adults, those with key medical co-morbidities, and those who take immunocompromised drugs. Despite descriptions of individuals based on demographics (like age), there are less data available to explore how distributions of high-risk populations for COVID-19 may exist across the United States. We sought to assess this using recent historical data from the IBM MarketScan Data where valuable information is available on medical co-morbidities and even immunocompromised drug use. This analysis focused on determining at-risk populations that may be used in future analyses to understand how policies for reopening businesses and other social activities may be tailored based on baseline risks within different states.
We focused on advanced age as well as 10 medical conditions that are at increased risk of severe illness from the virus that causes COVID-19 listed on the CDC website. Advanced age was defined as the proportion of the population 65 years of older. Medical conditions include cancer, chronic kidney disease, chronic obstructive pulmonary disease (COPD), heart conditions (including heart failure, coronary artery disease, cardiomyopathies, and pulmonary hypertension), immunocompromised state from solid organ transplant, obesity, severe obesity, sickle cell disease, smoking, and type 2 diabetes mellitus. In addition, we included immunocompromised state from use of immunomodulatory medicines like corticosteroids and biologics that are growing in use but that might place patients at an increased risk. To explore the distribution of high-risk populations, we used the historical patient data including their diagnosis and medicine information from 2018 IBM MarketScan Data. We identified the ICD-10 codes related to the above 10 medical conditions by using the Clinical Classification Software Refined (CCSR) database made available from the Agency for Healthcare Research and Quality (AHRQ).
We calculated the proportion of individuals aged 65 years or older in each state. For each medical condition, we counted the number of patients that have at least one corresponding ICD-10 code for each state using the MarketScan database. We calculated the proportion of patients with the medical condition in the MarketScan database to estimate the proportion of the population with the medical condition in each state. We then assessed the presence of immunomodulatory medicines among individuals enrolled in the MarketScan database to estimate the proportion of those taking immunomodulatory medicines. We used categories defined by prior work in our group that included classes of medications (e.g., corticosteroids, biologics, etc.). We also estimated the proportion of the population that is at increased risk in each state by combining the result of all medical conditions, and immunomodulatory medicines.
State | All High-risk | Heart Conditions | Immunocompromised From Transplant | Immunocompromised From Medicines | Age65+ | Cancer | Chronic Kidney Disease | COPD | Obesity | Severe Obesity | Sickle Cell Disease | Smoking | Type2 Diabetes Mellitus |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Alabama | 0.3789 | 0.0424 | 0.0014 | 0.0171 | 0.1733 | 0.0359 | 0.0236 | 0.0200 | 0.2438 | 0.0549 | 0.0016 | 3e-04 | 0.1171 |
Alaska | 0.2445 | 0.0327 | 0.0014 | 0.0181 | 0.1252 | 0.0330 | 0.0145 | 0.0155 | 0.1112 | 0.0193 | 0.0001 | 9e-04 | 0.0982 |
Arizona | 0.2745 | 0.0369 | 0.0018 | 0.0171 | 0.1798 | 0.0436 | 0.0229 | 0.0165 | 0.1405 | 0.0211 | 0.0005 | 3e-04 | 0.0978 |
Arkansas | 0.2987 | 0.0544 | 0.0016 | 0.0165 | 0.1736 | 0.0337 | 0.0169 | 0.0248 | 0.1444 | 0.0315 | 0.0009 | 5e-04 | 0.1113 |
California | 0.2509 | 0.0365 | 0.0017 | 0.0178 | 0.1478 | 0.0463 | 0.0238 | 0.0142 | 0.1057 | 0.0113 | 0.0005 | 1e-04 | 0.1013 |
Colorado | 0.2184 | 0.0343 | 0.0016 | 0.0186 | 0.1463 | 0.0431 | 0.0182 | 0.0162 | 0.0964 | 0.0137 | 0.0003 | 3e-04 | 0.0713 |
Connecticut | 0.2369 | 0.0360 | 0.0016 | 0.0184 | 0.1768 | 0.0448 | 0.0130 | 0.0145 | 0.1116 | 0.0135 | 0.0006 | 1e-04 | 0.0788 |
Delaware | 0.3332 | 0.0388 | 0.0017 | 0.0184 | 0.1940 | 0.0435 | 0.0178 | 0.0180 | 0.1934 | 0.0315 | 0.0015 | 2e-04 | 0.1157 |
Florida | 0.3547 | 0.0570 | 0.0017 | 0.0189 | 0.2094 | 0.0600 | 0.0347 | 0.0258 | 0.1914 | 0.0265 | 0.0017 | 2e-04 | 0.1207 |
Georgia | 0.3121 | 0.0388 | 0.0018 | 0.0188 | 0.1429 | 0.0388 | 0.0211 | 0.0158 | 0.1743 | 0.0323 | 0.0017 | 3e-04 | 0.1107 |
Hawaii | 0.3683 | 0.0836 | 0.0052 | 0.0279 | 0.1896 | 0.0911 | 0.0661 | 0.0250 | 0.1058 | 0.0118 | 0.0005 | 5e-04 | 0.1487 |
Idaho | 0.2774 | 0.0407 | 0.0032 | 0.0258 | 0.1627 | 0.0412 | 0.0268 | 0.0236 | 0.1378 | 0.0264 | 0.0001 | 6e-04 | 0.1095 |
Illinois | 0.2629 | 0.0453 | 0.0018 | 0.0202 | 0.1612 | 0.0428 | 0.0211 | 0.0207 | 0.1152 | 0.0185 | 0.0007 | 3e-04 | 0.1025 |
Indiana | 0.2961 | 0.0422 | 0.0016 | 0.0189 | 0.1613 | 0.0375 | 0.0172 | 0.0260 | 0.1559 | 0.0307 | 0.0006 | 8e-04 | 0.1048 |
Iowa | 0.2683 | 0.0382 | 0.0017 | 0.0183 | 0.1753 | 0.0372 | 0.0168 | 0.0186 | 0.1498 | 0.0243 | 0.0005 | 6e-04 | 0.0913 |
Kansas | 0.2679 | 0.0419 | 0.0019 | 0.0179 | 0.1632 | 0.0354 | 0.0161 | 0.0190 | 0.1344 | 0.0240 | 0.0005 | 5e-04 | 0.1001 |
Kentucky | 0.3425 | 0.0403 | 0.0014 | 0.0198 | 0.1680 | 0.0411 | 0.0179 | 0.0242 | 0.2078 | 0.0439 | 0.0004 | 6e-04 | 0.1126 |
Louisiana | 0.2945 | 0.0440 | 0.0016 | 0.0141 | 0.1594 | 0.0332 | 0.0192 | 0.0158 | 0.1588 | 0.0282 | 0.0014 | 4e-04 | 0.1097 |
Maine | 0.2689 | 0.0431 | 0.0012 | 0.0193 | 0.2122 | 0.0413 | 0.0146 | 0.0272 | 0.1176 | 0.0183 | 0.0000 | 5e-04 | 0.0997 |
Maryland | 0.2972 | 0.0411 | 0.0019 | 0.0188 | 0.1587 | 0.0409 | 0.0195 | 0.0172 | 0.1531 | 0.0205 | 0.0024 | 2e-04 | 0.1146 |
Massachusetts | 0.2256 | 0.0306 | 0.0013 | 0.0175 | 0.1697 | 0.0468 | 0.0142 | 0.0134 | 0.1118 | 0.0126 | 0.0009 | 1e-04 | 0.0662 |
Michigan | 0.3767 | 0.0541 | 0.0017 | 0.0214 | 0.1768 | 0.0527 | 0.0269 | 0.0292 | 0.2327 | 0.0418 | 0.0010 | 4e-04 | 0.1127 |
Minnesota | 0.2030 | 0.0292 | 0.0021 | 0.0200 | 0.1632 | 0.0334 | 0.0134 | 0.0096 | 0.0980 | 0.0172 | 0.0004 | 3e-04 | 0.0695 |
Mississippi | 0.2806 | 0.0388 | 0.0010 | 0.0158 | 0.1635 | 0.0335 | 0.0152 | 0.0158 | 0.1368 | 0.0296 | 0.0013 | 3e-04 | 0.1163 |
Missouri | 0.3104 | 0.0440 | 0.0018 | 0.0192 | 0.1730 | 0.0387 | 0.0198 | 0.0241 | 0.1757 | 0.0341 | 0.0012 | 9e-04 | 0.1081 |
Montana | 0.2107 | 0.0342 | 0.0019 | 0.0191 | 0.1932 | 0.0380 | 0.0142 | 0.0184 | 0.0846 | 0.0171 | 0.0000 | 9e-04 | 0.0785 |
Nebraska | 0.2732 | 0.0391 | 0.0021 | 0.0188 | 0.1615 | 0.0344 | 0.0190 | 0.0228 | 0.1373 | 0.0250 | 0.0005 | 5e-04 | 0.1033 |
Nevada | 0.2745 | 0.0376 | 0.0015 | 0.0146 | 0.1610 | 0.0370 | 0.0237 | 0.0190 | 0.1313 | 0.0176 | 0.0007 | 3e-04 | 0.1090 |
New Hampshire | 0.2490 | 0.0374 | 0.0015 | 0.0214 | 0.1867 | 0.0466 | 0.0147 | 0.0207 | 0.1033 | 0.0166 | 0.0001 | 1e-04 | 0.0894 |
New Jersey | 0.2836 | 0.0439 | 0.0015 | 0.0155 | 0.1661 | 0.0429 | 0.0162 | 0.0161 | 0.1457 | 0.0178 | 0.0011 | 1e-04 | 0.1068 |
New Mexico | 0.2669 | 0.0347 | 0.0017 | 0.0167 | 0.1801 | 0.0358 | 0.0189 | 0.0199 | 0.1206 | 0.0171 | 0.0002 | 3e-04 | 0.1144 |
New York | 0.2868 | 0.0411 | 0.0015 | 0.0163 | 0.1694 | 0.0415 | 0.0153 | 0.0193 | 0.1462 | 0.0182 | 0.0016 | 2e-04 | 0.1105 |
North Carolina | 0.3036 | 0.0348 | 0.0016 | 0.0183 | 0.1670 | 0.0393 | 0.0172 | 0.0164 | 0.1713 | 0.0310 | 0.0016 | 3e-04 | 0.1043 |
North Dakota | 0.2570 | 0.0364 | 0.0016 | 0.0184 | 0.1573 | 0.0336 | 0.0209 | 0.0143 | 0.1381 | 0.0230 | 0.0000 | 1e-03 | 0.0878 |
Ohio | 0.3127 | 0.0539 | 0.0017 | 0.0197 | 0.1751 | 0.0442 | 0.0247 | 0.0305 | 0.1594 | 0.0299 | 0.0007 | 6e-04 | 0.1129 |
Oklahoma | 0.2823 | 0.0452 | 0.0016 | 0.0213 | 0.1605 | 0.0363 | 0.0190 | 0.0198 | 0.1335 | 0.0254 | 0.0004 | 4e-04 | 0.1172 |
Oregon | 0.2167 | 0.0234 | 0.0013 | 0.0167 | 0.1816 | 0.0359 | 0.0134 | 0.0097 | 0.1058 | 0.0216 | 0.0002 | 2e-04 | 0.0779 |
Pennsylvania | 0.3147 | 0.0437 | 0.0016 | 0.0192 | 0.1870 | 0.0464 | 0.0185 | 0.0200 | 0.1815 | 0.0327 | 0.0009 | 4e-04 | 0.0948 |
Rhode Island | 0.3063 | 0.0310 | 0.0011 | 0.0117 | 0.1766 | 0.0449 | 0.0108 | 0.0191 | 0.1898 | 0.0301 | 0.0008 | 2e-04 | 0.0853 |
South Carolina | 0.3161 | 0.0403 | 0.0015 | 0.0186 | 0.1820 | 0.0410 | 0.0170 | 0.0186 | 0.1723 | 0.0353 | 0.0012 | 3e-04 | 0.1095 |
South Dakota | 0.2298 | 0.0285 | 0.0020 | 0.0167 | 0.1717 | 0.0347 | 0.0159 | 0.0140 | 0.1181 | 0.0266 | 0.0003 | 9e-04 | 0.0776 |
Tennessee | 0.2993 | 0.0459 | 0.0015 | 0.0185 | 0.1674 | 0.0376 | 0.0213 | 0.0216 | 0.1494 | 0.0307 | 0.0007 | 5e-04 | 0.1104 |
Texas | 0.2736 | 0.0390 | 0.0017 | 0.0174 | 0.1288 | 0.0346 | 0.0206 | 0.0148 | 0.1385 | 0.0226 | 0.0009 | 2e-04 | 0.1105 |
Utah | 0.2571 | 0.0274 | 0.0020 | 0.0165 | 0.1141 | 0.0340 | 0.0159 | 0.0106 | 0.1402 | 0.0285 | 0.0002 | 4e-04 | 0.0922 |
Vermont | 0.2562 | 0.0420 | 0.0019 | 0.0206 | 0.2004 | 0.0499 | 0.0132 | 0.0262 | 0.1046 | 0.0157 | 0.0001 | 1e-04 | 0.0984 |
Virginia | 0.2745 | 0.0330 | 0.0017 | 0.0179 | 0.1592 | 0.0373 | 0.0156 | 0.0148 | 0.1449 | 0.0270 | 0.0014 | 5e-04 | 0.1021 |
Washington | 0.2493 | 0.0316 | 0.0017 | 0.0199 | 0.1589 | 0.0368 | 0.0167 | 0.0130 | 0.1193 | 0.0243 | 0.0004 | 3e-04 | 0.0944 |
West Virginia | 0.3856 | 0.0590 | 0.0014 | 0.0181 | 0.2048 | 0.0426 | 0.0259 | 0.0362 | 0.2230 | 0.0492 | 0.0002 | 5e-04 | 0.1290 |
Wisconsin | 0.2503 | 0.0386 | 0.0022 | 0.0220 | 0.1747 | 0.0393 | 0.0199 | 0.0150 | 0.1227 | 0.0228 | 0.0004 | 4e-04 | 0.0891 |
Wyoming | 0.2069 | 0.0319 | 0.0022 | 0.0188 | 0.1714 | 0.0329 | 0.0112 | 0.0230 | 0.0792 | 0.0158 | 0.0000 | 9e-04 | 0.0816 |
The results for advanced age, total risks, and a subset of the medical conditions are plotted in Figures 7-9, and certain geographic patterns emerge across the states. For example, in Figure 7, we observe that several states where populations have a greater proportion of individuals at an advanced age, such as FL, ME, VT, WV, and MT. In Figure 8, we can also see that people with overall high risks due to various medical conditions concentrated in states on the east coast and southern area, especially in KY, WV, MI, AL, and FL. The left panel of Figure 9 indicates that the distribution of people with heart conditions shows similar patterns, where the proportions are in general higher on the northeast and southeast parts of the country, and FL, MI, AR, OH, and WV are the most severe states. Lastly, we can see from the right panel of Figure 9 that states such as ID, OK, WI, MI, and NH have relatively higher proportions of people who take immunomodulatory medicines.
Results from these analyses suggest variability in at-risk populations for COVID-19 across the United States. A next step would be to correlate some of these baseline values with new and emerging data on the rising number of COVID-19 cases in recent months. A key feature that has been seen clinically is the susceptibility of certain key groups to COVID-19 infection – especially severe infections that lead to hospitalization and complications including death. We used the IBM MarketScan data which has a number of strengths including a broadly representative insured population of the U.S. In addition, this allowed us to look at some features like the use of immunomodulatory medicines that would otherwise not be easily understood. Of course, there are limitations to these data and their results should be examined carefully when making broader population-based estimates given a lack of data on uninsured individuals and non-random selection of participants within the database.
In Phase 2 of the AHA COVID-19 Data Challenge, we introduce 2 separate analyses that build on our earlier work by evaluating the impact of the COVID-19 pandemic on key population groups in the United States. In Part I of Phase 2, we report how policies related to university and college reopening affected new confirmed COVID-19 cases in surrounding communities leveraging several publicly available data sources. In Part II of Phase 2, we used IBM Marketscan Data to describe how the number of individuals at high-risk for COVID-19 infection varies across U.S. states within the general population. Addressing these questions together will hopefully better inform policymakers and public health officials on strategies to address the COVID-19 pandemic as winter arrives and new cases of the infection grow over time.
Comparison of the early-stage and late-stage impact of universities’ reopening (for Aim 1 only)
We studied the effect of enrollment proportion on the outcome in September and October separately. We found that the effect of university enrollment proportion has a more significant impact on increasing the risk of COVID-19 spread in September, which is right after reopening. Such effects become insignificant in October. Our main study covers both September and October and captures the significant effect of such variable.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | 0.182 | 0.227 | 0.423 |
I(Oct) | -1.017 | 0.035 | <0.001 |
AUGP | 0.404 | 0.011 | <0.001 |
AGE20 | 3.544 | 0.517 | <0.001 |
AGE65 | 0.651 | 0.436 | 0.135 |
RACEMINO | -0.350 | 0.090 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 0.378 | 0.092 | <0.001 |
AUGP * I(Oct) | -0.406 | 0.015 | <0.001 |
ENROLLP * I(Sep) | 2.431 | 0.315 | <0.001 |
ENROLLP * I(Oct) | -0.076 | 0.315 | 0.81 |
Replacing the exposure variable as the proportion of all types of enrollment (for Aim 1 only)
The estimated coefficients are quite close to the estimates in our main model. Adding additional online enrollment to the exposure variable even makes the new exposure variable more significant (\(\beta\): 2.006 \(\rightarrow\) 1.671; p-value: 8.18e-5 \(\rightarrow\) 1.45e-5). A potential explanation is that students who enroll online might still be around campus or local area and bring risks of coronavirus spread.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -2.609 | 0.498 | <0.001 |
AUGP | 0.206 | 0.017 | <0.001 |
AGE20 | 14.301 | 1.136 | <0.001 |
AGE65 | 4.619 | 0.962 | <0.001 |
ENROLLP_ALL | 1.671 | 0.385 | <0.001 |
RACEMINO | -0.969 | 0.199 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.801 | 0.201 | <0.001 |
Binarizing the covariate in-person and hybrid enrollment proportion (for Aim 1 only)
The covariate binarized enrollment indicator (i.e., whether a county has any student enrolled in in-person or hybrid mode) was not significant (p-value: 0.93). This demonstrates the necessity of measuring the exact number of in-person and hybrid enrollment as a continuous variable, rather than just a simple indicator, to reflect the extent of university students returning to the campus within each county.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -2.146 | 0.495 | <0.001 |
AUGP | 0.202 | 0.017 | <0.001 |
AGE20 | 13.793 | 1.139 | <0.001 |
AGE65 | 3.553 | 0.957 | <0.001 |
ENROLL_BINARY | -0.006 | 0.067 | 0.93 |
RACEMINO | -0.993 | 0.200 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.811 | 0.202 | <0.001 |
Including population density as a control variable
We included population density as an additional control variable for both Aim1 and Aim2. For both Aim 1 and Aim 2, the population density is not estimated as a significant covariate (p-value > 0.05). In other words, we did not see a very significant association between the population density and the confirmed rates.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -2.524 | 0.496 | <0.001 |
AUGP | 0.204 | 0.017 | <0.001 |
AGE20 | 14.218 | 1.141 | <0.001 |
AGE65 | 4.417 | 0.957 | <0.001 |
ENROLLP | 2.002 | 0.509 | <0.001 |
RACEMINO | -0.966 | 0.200 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
POP_DEN | 0.000 | 0.000 | 0.573 |
TR | 1.810 | 0.201 | <0.001 |
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -0.846 | 0.749 | 0.259 |
AUGP | 0.424 | 0.036 | <0.001 |
AGE20 | 12.513 | 1.856 | <0.001 |
AGE65 | -1.279 | 1.512 | 0.398 |
ONLINE% | -0.330 | 0.095 | <0.001 |
HYBRID% | -0.272 | 0.108 | 0.012 |
RACEMINO | -1.491 | 0.265 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
POP_DEN | 0.000 | 0.000 | 0.244 |
TR | 1.571 | 0.267 | <0.001 |
Excluding large population counties
We performed our analysis for Aim1 and Aim 2 after filtering top 5% large counties based on county population. The re-estimated coefficients are very similar to the original model, which suggests that our analysis is not sensitive to including counties with large populations.
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -2.542 | 0.505 | <0.001 |
AUGP | 0.203 | 0.017 | <0.001 |
AGE20 | 14.259 | 1.156 | <0.001 |
AGE65 | 4.271 | 0.974 | <0.001 |
ENROLLP | 1.971 | 0.515 | <0.001 |
RACEMINO | -0.926 | 0.206 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.852 | 0.206 | <0.001 |
Estimate | Std Error | P-value | |
---|---|---|---|
Intercept | -0.944 | 0.783 | 0.228 |
AUGP | 0.429 | 0.038 | <0.001 |
AGE20 | 13.176 | 1.929 | <0.001 |
AGE65 | -1.787 | 1.585 | 0.26 |
ONLINE% | -0.285 | 0.099 | 0.004 |
HYBRID% | -0.253 | 0.112 | 0.024 |
RACEMINO | -1.529 | 0.284 | <0.001 |
MEDHHINC | 0.000 | 0.000 | <0.001 |
TR | 1.658 | 0.284 | <0.001 |
Department of Statistics, University of Michigan, yangly@umich.edu↩
Department of Statistics, University of Michigan, chengmc@umich.edu↩
Department of Statistics, University of Michigan, weijtang@umich.edu↩
Department of Statistics, University of Michigan, xfzhang@umich.edu↩
Department of Statistics, University of Michigan, jizhu@umich.edu↩
School of Medicine, University of Michigan, bnallamo@med.umich.edu, 734-647-1624↩