Environmental research and p

Download 0.87 Mb.
Hajmi0.87 Mb.
  1   2   3   4




Artificial Neural Network Modeling of Novel Coronavirus (COVID-19) Incidence Rates across the Continental United States

Abolfazl Mollalo 1,*, Kiara M. Rivera 1 and Behzad Vahedi 2

  • Department of Public Health and Prevention Sciences, School of Health Sciences, Baldwin Wallace University, Berea, OH 44017, USA; krivera19@bw.edu

    • Department of Geography, University of California Santa Barbara (UCSB), Santa Barbara, CA 93106, USA; behzad@ucsb.edu

* Correspondence: amollalo@bw.edu

Received: 21 May 2020; Accepted: 10 June 2020; Published: 12 June 2020

Abstract: Prediction of the COVID-19 incidence rate is a matter of global importance, particularly in the United States. As of 4 June 2020, more than 1.8 million confirmed cases and over 108 thousand deaths have been reported in this country. Few studies have examined nationwide modeling of COVID-19 incidence in the United States particularly using machine-learning algorithms. Thus, we collected and prepared a database of 57 candidate explanatory variables to examine the performance of multilayer perceptron (MLP) neural network in predicting the cumulative COVID-19 incidence rates across the continental United States. Our results indicated that a single-hidden-layer MLP could explain almost 65% of the correlation with ground truth for the holdout samples. Sensitivity analysis conducted on this model showed that the age-adjusted mortality rates of ischemic heart disease, pancreatic cancer, and leukemia, together with two socioeconomic and environmental factors (median household income and total precipitation), are among the most substantial factors for predicting COVID-19 incidence rates. Moreover, results of the logistic regression model indicated that these variables could explain the presence/absence of the hotspots of disease incidence that were identified by Getis-Ord Gi* (p < 0.05) in a geographic information system environment. The findings may provide useful insights for public health decision makers regarding the influence of potential risk factors associated with the COVID-19 incidence at the county level.
Keywords: artificial neural networks; COVID-19 (Coronavirus); GIS; multilayer perceptron; United States

1. Introduction
Novel coronavirus disease (COVID-19) has rapidly spread worldwide, becoming a global health threat [1]. The disease was first identified in Wuhan, China, and continued to spread out across the world [2]. According to the World Health Organization [3], as of 4 June 2020, there have been more than 6.4 million confirmed cases and over 380 thousand deaths worldwide. These statistics have surpassed the number of deaths and cases for Middle East respiratory syndrome (MERS) and severe acute respiratory disorder (SARS) since their outbreaks [4]. The pandemic has directly impacted the economy, society, and healthcare systems. According to the International Monetary Fund [5], global economic growth in the year 2020 is estimated to be -3.0%, compared to +2.9% in 2019. The United Nations predicts that the pandemic can continue to adversely impact societies with perpetual disease spread due to improper policy interventions [6].
Although the United States is ranked number one in the global health security index [7], it is the leading country in the number of confirmed cases and deaths globally [8]. As of 4 June 2020, there

Int. J. Environ. Res. Public Health 2020, 17, 4204; doi:10.3390/ijerph17124204 www.mdpi.com/journal/ijerph

Int. J. Environ. Res. Public Health 2020, 17, 4204

2 of 13

have been over 1.8 million confirmed cases and more than 108,000 deaths in this country [9]. Moreover, the case fatality ratio (CFR) continues to fluctuate in this country. As of 4 June 2020, the United States ranks in ninth place worldwide, with a CFR of 5.8% [10].

Recent studies have demonstrated that preexisting conditions, such as cardiovascular diseases [11], respiratory diseases [12], cancer [13], infectious diseases [14], and substance abuse [15], can contribute to the elevated morbidity and mortality of COVID-19. In China, Zheng et al. [11] utilized the MERS virus as a reference and suggested that SARS-CoV-12 can cause cardiac failure and acute myocarditis. Although the findings were preliminary, they indicated that patients could experience chronic cardiovascular e ects secondary to contracting the disease. Lippi and Henry [12] conducted a meta-analysis demonstrating that chronic obstructive pulmonary disease (COPD) patients are five times more at risk of contracting the SARS-CoV-2 virus. You et al. [13] alluded to the guidelines suggested by French medical oncologists on cancer patient care during the pandemic. In South Africa, Cox et al. [14] highlighted changes in tuberculosis (TB) patients’ treatment during the pandemic. In the United Kingdom, Marsden et al. [15] indicated how individuals with substance abuse disorders might experience addiction augmentation during the pandemic, consequently, increasing the risk for COVID-19 contraction. They suggested that substance abuse disorder may not be overlooked when addressing preexisting conditions in COVID-19 patients.
In addition to preexisting conditions, environmental [16], demographic, and socioeconomic [17] factors can potentially influence COVID-19 incidence. For instance, Wang et al. [16] indicated that COVID-19 transmission is influenced by temperature variability. Their results suggest that reduced COVID-19 transmission is associated with higher humidity and temperature. In the United States, Mollalo et al. [17] suggested that higher percentages of nurse practitioners and black females and higher income inequality at the county level could explain 68.1% of COVID-19 incidence geographic variations.
Artificial neural networks (ANNs) are relatively novel techniques to model complex non-linear relationships in spatial epidemiology [18]. The techniques have been applied in a variety of fields, including but not limited to environmental science [19,20], agriculture [21], finance [22,23], artificial intelligence [24], epidemiology and public health [25–27]. Reddy and Imler [26] demonstrated that ANNs could provide reliable predictions for chronic diseases, such as cirrhosis patients with hepatocellular carcinoma. They found high sensitivity (80.61–86.67%) and specificity (99.88–99.95%), corresponding to demographic and physiological inputs. Badnjevi´c et al. [28] incorporated ANNs to classify asthma; they found high levels of sensitivity (97.11%) in asthmatic individuals and specificity (98.85%) in healthy individuals. Their findings suggested that ANNs can be appropriate techniques for asthma detection. Due to a lack of research on the spatial complexities of COVID-19 at the national level, in this study, we leveraged the potential of ANNs in identifying complex spatial patterns and the power of geographic information systems (GIS) in spatial analysis [29,30] to predict county-level COVID-19 incidence rates in the continental United States. We employed one of the widely used topologies of ANNs that is described in Section 2.4.
2. Materials and Methods
2.1. Data Collection and Preparation
COVID-19 is continually monitored by governmental health agencies and institutions of higher learning, such as the US Centers for Disease and Control and Johns Hopkins University [31]. In this study, we compiled a database of 57 candidate variables that may predict county-level cumulative disease incidence as a dependent variable. From January 22 to April 25, 2020, cumulative numbers of confirmed cases of COVID-19 across the continental United States were collected at the county level from USAFacts (usafacts.org) and normalized by populations. The counties (n = 3109) were considered as samples that represent the status of the disease in the US. In this study, socioeconomic (such as household income, income inequalities, and unemployment rate), behavioral (such as smoking), environmental (such as temperature, precipitation, and air pollution), topographic (such as altitude,

Int. J. Environ. Res. Public Health 2020, 17, 4204

3 of 13

and terrain slope), and demographic (such as proportions of age groups, race, gender, and access to primary care) factors were prepared at the county level and were used as explanatory variables. To avoid reiteration, a complete description of the used variables has been provided in Mollalo et al. [17].

In addition to the above explanatory variables, which were also used in the study of Mollalo et al. [17], age-adjusted mortality rates of several diseases were incorporated, including infectious diseases (i.e., TB, HIV/AIDS, hepatitis, and lower respiratory infection), cardiovascular diseases (i.e., cerebrovascular disease, hypertensive heart disease, ischemic heart disease, cardiomyopathy and myocarditis, atrial fibrillation and peripheral vascular disease), chronic respiratory diseases (i.e., COPD, asthma, interstitial lung disease, and pulmonary sarcoidosis), cancer (i.e., pancreatic, gallbladder and biliary tract, mesothelioma, Hodgkin lymphoma, leukemia, tracheal, bronchus, and lung cancer), and substance use disorders (i.e., drug and alcohol use). The data were retrieved from the University of Washington Global Health Data Exchange (http://ghdx.healthdata.org/us-data) and joined to the preexisting database. All data were collected and prepared at the county level and are publicly available. A list of all variables can be found in the Supplementary Materials.
2.2. Spatial Analysis
We examined the geographic distribution of the COVID-19 incidence rate using global and local indices. The global Moran’s index [32,33] was used to identify the overall pattern (random, clustered, or dispersed) of disease incidence rate using the following formula:









j=1, j,i

wi jCiCj


I =





wi j









Download 0.87 Mb.

Do'stlaringiz bilan baham:
  1   2   3   4

Ma'lumotlar bazasi mualliflik huquqi bilan himoyalangan ©fayllar.org 2020
ma'muriyatiga murojaat qiling