Corrected: Modeling the Epidemiological Trend and Behavior of COVID-19 in Italy

As of May 14, 2020, Italy has been one of the red hotspots for the COVID-19 pandemic. In particular, the regions of Emilia Romagna, Piedmont, and especially Lombardy were the most affected and had to face very serious health emergencies, which brought them to the brink of collapse. Since the virus has demonstrated local properties, i.e., greater severity and contagiousness in specific regions, the aim of this study is to model the complex behavior of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in Italy. In particular, we further investigated the results of other articles on the correlation with particulate matter pollution 10 (PM 10) and 2.5 (PM 2.5) by extending the research at the intra-regional level, as well as calculated a more plausible number of those infected compared to those officially declared by Civil Protection. Through a computational simulation of the Susceptible-Exposed-Infectious-Recovered (S.E.I.R.) model, we also estimated the most representative basic reproduction number for these three regions from February 22 to March 14, 2020. In doing so, we have been able to evaluate the consistency of the first containment measures until the end of April, as well as identify possible SARS-CoV-2 local behavior mutations and specificities.


Conclusions section:
Original sentence: "In particular, in Lombardy, we found an extremely significant correlation between virus spread and the number of inhabitants per province while in Piedmont, this happened only with the population density." New sentence: "In particular, in Lombardy, we found a significant correlation between virus spread and the number of inhabitants per province while in Piedmont, this happened also with the population density."

Table 8 (in Appendix 2)
Format fixes: -all commas (indicating the beginning of the decimal part of a number in Italian notation) have been replaced with periods (indicating the beginning of the decimal part of a number in English notation) -all the reported values now contain, when necessary, a maximum of 3 significant digits in the decimal part Content corrections: -the value "Pearson II = -0.34" of the second sub-table has been replaced with the value "Pearson II = 0.00" -the value "p-value II = .41" of the second sub-table has been replaced with the value "p-value II = 1" -the value "Pearson II = -0.42" of the third sub-table has been replaced with the value "Pearson II = 0.81" -the value "p-value II = .30" of the third sub-table has been replaced with the value "p-value II = .015" -the value "Pearson II = -0.31" of the fourth sub-table has been replaced with the value "Pearson II = 0.99" -the value "p-value II = .46" of the fourth sub-table has been replaced with the value "p-value II <.0001" -the value "Pearson II = -0.34" of the fifth, sixth, and seventh sub-table has been replaced with the value "Pearson II = 0.99"

Introduction
The current surge of COVID-19 pandemic is devastating globally, with over 4,200,000 cases and more than 290,000 deaths reported [1]. In Europe, COVID-19 cases have started to dramatically increase from the first week of March 2020. Of these, Italy was grappling with the worst outbreak, with over 35,713 confirmed cases, and around 3000 confirmed deaths by March 18, 2020 [2]. This exponential increase in COVID-19 positive cases in Italy raised turmoil, and the government decreed a lockdown for the entire country [3]. This is the first study that examines the behavior of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) at the provincial level in the most affected regions. Considering that researches on a national scale are subject to high variances, as they use regional data, i.e., consider the regions as uniform epidemic nuclei, such a survey is fundamental to understand, which is the most appropriate investigation scale between the national, regional, and provincial ones. Making use of the epidemic parameters provided by the World Health Organization (WHO), we utilized the Susceptible-Exposed-Infectious-Recovered (S.E.I.R.) mathematical model to predict the trend of infections during the first half of March 2020, when the effects of the lockdown were not yet measurable. This allowed us to signal the presence of SARS-CoV-2 local characteristics changes due to possible evolutionary genetic mutations, correlations with pollution like Particulate Matter 10 (PM 10) and Particulate Matter 2.5 (PM 2.5), mismanagement of the crisis by national government agencies, and non-compliance with the lockdown rules by citizens or other unknown factors. To do this, we compared the general trend foreseen by the S.E.I.R. with the estimated, highlighting the discrepancies between the individual Italian regions.

Materials And Methods
To carry out this study, the most recent data found in the scientific literature relating to the total and active cases, deaths, recoveries, and all epidemic parameters of COVID-19 have been used. We focused especially on Lombardy since it was by far the most afflicted region in Italy. Considering the novel coronavirus incubation period is around three to six days, with a range from a minimum of two days to a maximum of 14, we first examined the population of COVID-19 cases reported between February 22 and March 14, 2020 [4]. Following this, we analyzed the Pearson linear correlation between the number of COVID-19 total cases and the concentrations of PM 10 and PM 2.5. All daily PM data were collected from the Regional Agency for Environmental Protection (ARPA) regional websites in the interval of January 1 -May 14, 2020. Several monitoring stations were used, and all results were provided with a Gaussian 95% confidence interval. All data have been organized into two time periods such as "pre-lockdown," from January 1 until February 29, 2020, and "post-lockdown," from March 1 until May 14, 2020; this helped us estimate the link between the virus spread and the particulate matter (as emissions dropped dramatically during the lockdown). Other types of correlation with population density and number of inhabitants were also investigated.

S.E.I.R. modeling
Assuming as true the probable "non-relapse patients" hypothesis, we applied the S.E.I.R. model to predict the novel coronavirus evolution in Italy, as it is suitable for describing the spread of a virus in populations where no restrictions have been applied [5]. Thanks to the comparison between the S.E.I.R. values and the theoretical estimates (TE) in the short period, it is likely to highlight essential behavior mutations and/or the effectiveness of containment strategies. We used S.E.I.R. differential equations and non-linear methods to resolve the gaps analytically [6]. We examined Lombardy, Emilia Romagna, and Piedmont separately because of the huge discrepancy of COVID-19 cases between these and other Italian regions. An iterative algorithm was developed using the C++ programming language to find a solution through a finite discretization method (Appendix 1). Given the low deaths-population ratio, the total population number has been considered constant.

Software iterative algorithm
By entering the initial values for the incubation time , the recovery time , the basic reproduction number , the number of infected , and the number of recovered on February 22, 2020, the software prints the S.E.I.R. predictions day by day. The best epidemic parameters were estimated through continuous iteration until the "closest values to the real ones" were reached until March 12, 2020. The number of initial incubates was calculated with the formula . We report below the system of equations and their discretization through the finite increment : with incubation time, recovery time, basic reproduction number, number of susceptible people, number of active exposed people (people in incubation), number of active infected people, number of recovered people (no longer infectable).

statistical analysis
The compatibility between S.E.I.R. predictions and TE values was investigated within a closed ball of radius five days center on March 7 (from March 2 to March 12, 2020). We searched for the best infection mortality rate and basic reproduction number through the minimization of two estimators: the first, called , was the algebraic mean value of the absolute percentage differences between the best value for the S.E.I.R. value Gaussian distribution and the value itself, according to the formula . This allows us to assess the quality of S.E.I.R. modeling. The second, called , was the algebraic mean value of the ratios between the fixed standard deviation and the corresponding , according to the formula . This allows us to calculate the relative errors. The lower the , the more representative is of the TE evolution; the lower the , the more accurate is the TE. The iteration was carried out until and minimums were reached. The finite increments chosen for and were and , respectively. Every combination with m in and was tried. The chosen significance limit for and was (i.e. and must be lower than or equal for a result to be acceptable); the iteration ended when and . The best confidence intervals were calculated considering the Gaussian distribution . We reported a range interval "CRI" for all compatibles we found. All the analysis was carried out through C ++ software and Microsoft Excel (Microsoft Corporation, Redmond, Washington). Since is subject to a very wide margin of error; when more than one compatible couple was computed, we utilized the weighted average .

COVID-19 real cases estimate
In order to calculate the real number of COVID-19 total cases in Lombardy, we used an estimation method we called "Theoretical Estimate" (TE). Thanks to the results of another study, in which the number of regional and national deaths until May 5, 2020, was compared with those of the previous five years, it was possible to calculate the number of theoretical cases by adding the number of COVID-19 missing cases until the expected infection mortality was achieved. To do this, we considered a mortality rate free to vary in the range (0.6, 1.6) [7][8][9]. The "COVID-19 total deaths number" was shifted seven days backward due to the time between contracting the disease and demise. We also calculated the difference between confirmed and estimated COVID-19 trough the ratio α = estimated cases / confirmed cases.

PM 10 data analysis
First, we collected PM 10 daily averages data on the most affected cities of the three main regions involved in the COVID-19 epidemic such as Lombardy, Piedmont, and Emilia Romagna; then, we did the same for some regions where the novel coronavirus spree was not so pressing like Lazio and Campania. Finally, we built the symmetric correlation matrix : with PM 10 -COVID-19 confirmed cases Pearson correlation index, PM 10 -population density Pearson correlation index, PM 10 -total population Pearson correlation index, COVID-19 casespopulation density Pearson correlation index, PM 10 -COVID-19 Pearson correlation index, population density -total population Pearson correlation index.
However, since the first outbreak occurred in northern Italy, it is plausible to think the infection did not have the same contagion power in the southern regions; thus, we recalculated the matrix only for the three most infected regions to check whether there was a local correlation. Finally, in order to make the investigation even more precise and specific, we repeated the operation once again in all Lombardy provinces. All the values have been reported with their relative p-values, according to the form (ij-th p-value). All the PM 10 average daily values were reported with Gaussian 95% confidence intervals .

PM 2.5 data analysis
While other studies have been conducted on the PM 2.5 -novel coronavirus correlation at the national level, we have focused exclusively on the Lombardy region, analyzing the data of all the monitoring units of all the provinces through the previously defined correlation matrix [10].

Epidemic forecast
For each infection mortality rate m, a compatible value of was found; therefore, we utilized the weighted average . For the Lombardy region in the period February 22 -March 12, 2020, we estimated a basic reproduction number , with and . The calculated number of real infections exceeds that of confirmed infections by a factor until May 1, 2020 . The separation point between the S.E.I.R. and TE trends is positioned in a neighborhood of March 12, 2020; ergo, that is the period when the lockdown began to take effect ( Figure 1). The two corner points I and II in Figure 1 indicate further decreases in .

Region City
Pre-lockdown (Jan 1, Feb 29)   A statistically significant correlation between SARS-CoV-2 spread and PM 2.5 was highlighted in Lombardy during the first two weeks of March, with a and a Pearson correlation coefficient (Tables 3-4). Beyond that time, the above correlation has decreased in favor of a very significant SARS-CoV-2 -provinces population number correlation ( ) ( Figure 4). Anyway, as can be seen in Figure 4, we must point out that, excluding the period from March 10-14, the PM 2.5 correlation p-values always remained close to the chosen threshold until the end of April.    The SARS-CoV-2 -provinces population number correlation in Emilia Romagna grew much more slowly than in Lombardy, reaching its maximum value on May 14, 2020 ( ). Finally, in Piedmont, there were significant correlations between the COVID-19 cases and the population number (ρ = .99, pvalue <.0001) and, unlike Emilia Romagna and Lombardy, the COVID-19 cases and the population density (ρ = .71, p-value = .048 on May 14, 2020) (Appendix 2).

Discussion
For this analysis, we made some assumptions: · given the antibody response identified in COVID-19 patients, it seems unlikely that the novel coronavirus, without significantly changing, could infect a patient again (in the short term) [11][12]; · the mortality rate is too low to affect the evolution of the S.E.I.R. system [7][8][9]; · Young people and children appear to have an important role in the spread of infection [13][14][15]. Therefore, to analyze the SARS-CoV-2 dynamics in the initial stages, we have adopted the S.E.I.R. model (Susceptible → Exposed → Infectious → Recovered) since it is suitable for describing the spread of a virus in a non-relapse free population. This allowed us to evaluate the effectiveness of the containment measures and/or any virus behavior mutation by comparing S.E.I.R. and theoretical trends (Figures 1-2). Given the absolutely abnormal number of COVID-19 infections and deaths, we treated Lombardy as a standalone case; moreover, the modeling we have made exclusively concerns northern Italy since it was the most COVID-19 affected region. Our results show the effectiveness of the Italian lockdown. However, the strong correlation between the total infected and the number of inhabitants in Lombardy suggests the virus nevertheless circulated in a very substantial way among this region, i.e., containment measures are likely to have been taken with a heavy delay. In fact, as of May 1, the estimated number of total COVID-19 cases in Lombardy was almost 3 million. Furthermore, it is plausible that the effect of such a latency was aggravated by the presence of PM 2.5 as shown in Figure 4. In this regard, we must point out the COVID-19 cases -PM 2.5 correlation p-value has never exceeded a maximum of .16. The substantial difference between the basic reproduction number of Lombardy and that of the other two most affected regions is the main symptom of the local behavior of the novel coronavirus. About this aspect, various speculations, hypotheses, and theories were made: some of them concern a possible evolutionary genetic mutation of SARS-CoV-2, which has made it more contagious or virulent [16][17][18][19]. If so, this should have afflicted Lombardy. Other researches instead assert that the virus's genetic mutations did not cause a mutation in its behavior [20]; if so, the greater Lombard virulence should find explanations in correlation with pollution, delay in lockdown, or other factors such as a major predisposition of the subjects to be infected and develop severe symptoms [10,21]. However, it must be considered that many of these scenarios could prove to be true simultaneously. One hypothesis to be excluded is the bijective relation between local demography and SARS-CoV-2 spread: in fact, Lombardy and Emilia Romagna's age groups are totally comparable [22][23]. The behavior of the novel coronavirus in Piedmont was also peculiar: first, despite what happened in Lombardy and Emilia Romagna, we found a strong correlation between the population density and the number of total cases. Furthermore, the epidemic seems to have started with a delay of one week. This fact cannot be explained by the presence of particulate pollution since the concentrations of PM 2.5 and PM 10 were already below the safety threshold due to the quarantine. Therefore, this episode could be linked to virus behavior mutations, inadequate containment measures, or people lockdown violations. As for a possible link between PM 10 and novel coronavirus, we found no significant correlation. At the national level, concentrations of PM 10 in some cities of central and southern Italy were comparable to those of Lombardy and Emilia Romagna. Cities like Frosinone, Rome Tiburtina, and Naples, where COVID-19 infections were few, had higher values than cities like Brescia and Bergamo where the infection was devastating. Moreover, PM 10 concentrations in Emilia Romagna and Piedmont are equivalent to Lombard ones unlike the density of COVID-19 cases. Evaluating the hypothesis that the first outbreak location was highly incident on the virus spread, we conducted the analysis only in the Lombardy region, but the result was still negative. The lack of a clear correlation, however, is not sufficient to exclude every relation between PM 10 and SARS-CoV-2 for two reasons: · the town of Codogno, where the first Lombardy outbreak occurred, recorded very high PM 10 average daily values between January 1 and February 29 ( ) (Appendix 2); · the novel coronavirus behavior may not be related to the daily average values of PM 10 but to specific thresholds, above which its virulence and contagiousness would increase considerably. For example, an hour of very heavy traffic could favor its spread much more than a constant release of the same particulate matter amount over a longer time-lapse.
On the contrary, both at the national level (as shown by Frontera et al.) and in Lombardy, the correlation with PM 2.5 was much more evident and significant. For this reason, further investigations are required to determine what is the biological link between the spread of SARS-CoV-2 and fine pollution.

Limitations
The S.E.I.R. model predictability is inversely proportional to the prediction temporal distance since the fixed values, such as basic reproduction number, incubation time, and healing time, could change due to virus behavior mutations; furthermore, iterative methods propagate errors divergently. Anyway, its adoption in the short term remains valid since error propagation is truncated. The mortality rate is considered constant in the period February 22 -May 1, 2020. Regarding PM 10 and PM 2.5 aerosols, we only had access to cities' and regions' daily average values and the data of some monitoring stations were not available. One study used to estimate real COVID-19 cases had no peer review; however, the data inherent to the virus mortality were also supported by peer-reviewed articles and through an independent verification from the National Institute of Statistics (ISTAT) website [24].

Conclusions
In this study, we demonstrated that the intra-regional and provincial analysis is able to highlight substantial differences in the behavior of the novel coronaviruses. In particular, in Lombardy, we found a significant correlation between virus spread and the number of inhabitants per province while in Piedmont, this happened also with the population density. Significant correlations occurred in Lombardy between the number of COVID-19 cases per province and the presence of PM 2.5; this confirms what has been observed in other studies on a national scale. Our analysis of the real number of infected leads us to conclude that the Civil Protection official data are enormously underestimated. From the comparison between the S.E.I.R predictions and the calculated real infections trends, as well as between the infections trends of Lombardy and other Italian regions, we report that the Italian COVID-19 data are statistically compatible with possible evolutionary mutations of SARS-CoV-2. Although a statistical investigation is absolutely essential to understand the behavior of SARS-CoV-2, it cannot be a substitute for a molecular investigation. For this reason, we believe this paper can be useful to guide biologists and virologists on the virus specificities that need to be further explored, thus preventing the return of such a crisis.

Additional Information Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue.

Conflicts of interest:
In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.