Covid-19 vs. BCG Universal Immunization: Statistical Significance in the Early Phase of the Pandemics

A possible correlation between the impact of Covid-19 and universal immunization program against tuberculosis with Bacillus Calmette-Guérin (BCG) vaccine was suggested previously, based on apparent correlation between lower impact of the epidemics and a record of national BCG immunization program. In this work a time-adjusted dataset of Covid-19 statistical data by national and subnational health jurisdictions at the time point of six months after the local arrival of the epidemics was used to perform a statistical analysis of the significance of the correlation hypothesis between universal BCG immunization and a milder early phase Covid-19 scenario. With the data accumulated up to the point of the analysis, the significance of the correlation hypothesis was evaluated both qualitatively and quantitatively with the conclusion that it has achieved statistically significant level of confidence. The conclusions of this research can be used in development of epidemiological policy as well as the rationale to investigate the origin and mechanisms of a broad immunity protection that can be associated with an early-age exposure to BCG vaccine.


Introduction
As the novel coronavirus Covid-19 epidemics has emerged and penetrated multiple national jurisdictions, a possible correlation between the impact of Covid-19 and universal immunization program against tuberculosis with Bacillus Calmette-Guérin (BCG) vaccine was suggested by Miller et al. (2020) [1], based on apparent correlation between lower rate and impact of the epidemics and a record of national BCG immunization program (UBIP). Due to significant variation in multiple potential factors of influence between jurisdictions and stochastic and unpredictable character of development of the Covid-19 epidemics, a confident conclusion about such correlation can be far from trivial and for this reason was investigated in a large and growing number of studies [2][3][4][5]. Escobar et al. (2020) [2], a strong correlation between the BCG immunization record and Covid-19 mortality in a number of culturally and socially similar European countries was observed (R 2 = 0.88; P = 8 × 10 −7 ), indicating that every 10% increase in the BCG index was associated with a 10.4% reduction in Covid-19 mortality. The results imposed strong constraints on the null hypothesis (that is, of no correlation between a current or previous UBIP in the jurisdiction and the recorded Covid-19 impact), suggesting that BCG may have a certain broad protective effect resulting in a milder epidemiological scenario. A similar conclusion is supported by the results in Sharma et al. (2020) and Yitbarek et al. (2020) [3,4] establishing a strong correlation between a record of UBIP, current or previous, and lower factors of Covid-19 impact in the jurisdiction measured by infection incidence and the resulting mortality "The results … show that countries without a universal BCG policy (such as Belgium, Italy, the United States, and the Netherlands) have increased incidence of COVID-19 (2810.9 ± 497.1 (mean ± SEM) per million) compared with countries with ongoing national BCG policy (570.9 ± 155.6 (mean ± SEM) per million)" [4]. Dolgikh (2020) [5] studied qualitative and quantitative analysis of distribution of Covid-19 impacts among national and subnational jurisdictions in Europe, North America and Middle East was performed with a number of observations consistently pointing to a possibility of a correlation between UBIP and a milder type of the epidemiological scenario in the initial phase of the Covid-19 epidemics. These findings were consistent with a number of further results in support of the correlation hypothesis [6][7][8].
On the other hand, in a number of studies arguments against correlation were presented [9,10] indicating complex character of the relationship with multiple influencing factors that make confident determination of the correlation challenging.
Given the complexity and importance of the problem, the intent of this work was to analyze publicly available Covid-19 epidemiological data reported by national and subnational public health jurisdictions with respect to the hypothesized induced immunity population-scale protection resulting from a universal BCG vaccination policy, current or previous, and attempt both qualitative and quantitative analysis of the hypothesis of a correlation between a current or previous UBIP in the jurisdiction and a milder scenario of Covid-19 epidemics; to verify the assumptions, the results and conclusions of the earlier studies with a specific objective to determine, in a quantitative analysis, the constraints and confidence of the correlation and null hypotheses. To this end, a dataset of cases with different time of local arrival of the epidemics was compiled and analyzed with statistical methods to evaluate statistical significance of the UBIP -Covid-19 correlation hypothesis. Whereas an earlier work of the authors [5] was intended to point out a number of observations consistent with the correlation hypothesis, in this work these observations are verified at a different time point, indicating stability of the established connection; and a quantitative statistical significance analysis of the correlation and null hypotheses performed with the compiled dataset of epidemiological cases.
It needs to be noted and emphasized that the analysis in this study applied to the early, initial phase of development of Covid-19 pandemics and may not be assumed to be applicable unconditionally to the subsequent phases, due to a number of factors, including mutability of the infection agent, difference in the standards of quality of the public health care system, availability of resources, epidemiological management and control policies including vaccination and other factors.

Terminology
Timing considerations can be critical in the analysis of the development of an epidemiological scenario. For this reason, an effort was made to ensure that the data was recorded at a similar phase in the development of the epidemics in the reporting jurisdiction. To emphasize the need for synchronization of epidemiological data, the zero time of the start of the global Covid-19 pandemics was defined in Dolgikh (2020) study [5] as 31.12.2020; along with the global Time Zero (TZ) we also define local Time Zero point (LTZ) indicating the time of arrival of the epidemics in the given locality. It can be sensibly defined as the date of the first confirmed case in the local area, subnational or national jurisdiction.
The impact of the epidemics was measured by Covid-19 caused mortality per 1 Million capita in a reporting jurisdiction as a function of time: It was judged that especially in the initial phase of the epidemics this parameter could be a more current and accurate measure of the epidemiological impact than the number of cases that strongly depends on the testing and other practices that may not be consistent between jurisdictions, on the assumption that policies and protocols in the selected public health administration allowed reasonably accurate identification of cause and accurate and timely reporting.
Evidently, as defined, the impact of the epidemics in a jurisdiction, m(c, t) would be a function of the jurisdiction factors F, including demographics, geographical distribution, prosperity, social customs and traditions, lifestyle, public health administration and epidemiological policy, including records of universal immunization programs and not in the least, the time. As the experience of the earlier period shows, the considerations of timing can be of critical importance for the correctness of the conclusions of the analysis. For that reason, an attention was given to comparing impacts relative to the time of the first exposure by using time-adjusted data at the same local time point in the development of the epidemics.
Throughout this work two related measures of the epidemiological impact were used as well: relative impact mR(t) measured as a ratio of the impact recorded in the locality to the world's highest value (at the time of evaluation); and the impact expressed in a logarithmic scale mL(t): ( , ) = log( ( , )) At the time point of the analysis in this work, that is, approximately six months after of local development, it is expected that the epidemics had developed, and the relationships of factors of influence would have a more stable nature than in the very early phase that in many observed cases was subject to random factors and fluctuations; whereas in the later phases differences between jurisdictions in a number of essential factors, including not in the least, epidemiological policy and availability of resources, could become more pronounced.
The rest of this paper is organized as follows. In Section 2, the methods of the research and the dataset used in the study are described. Section 3 contains the results of qualitative and quantitative analysis of the correlation hypothesis. Section 4 contains detailed definition of the method and the results of the statistical significance analysis. In Section 5 a discussion of the findings is provided. Finally, in the concluding Section 6 the results of the study are summarized and a brief discussion of possible applications offered.

Research Methodology
Qualitative and quantitative methods such as: comparison of cases with similar socio-economic factors; trend analysis with respect to duration of UBIP; and evaluation of statistical significance based on observed values of sample means in the identified groups of cases were applied to analyze trends in development of the epidemiological situation in the selected jurisdictions with the intent to evaluate statistical significance of the correlation hypothesis between the impact of Covid-19 epidemics and a record of universal BCG immunization and impose quantitative constraints on the null hypothesis.

UBIP Correlation Hypothesis
The hypothesis that will be evaluated in the study is that of an induced broad protection effect against viral infections including Covid-19 at a population-scale level (importantly to note, not necessarily a complete individual protection as in the case of regular vaccinations; and with further qualifications e.g. the immunization program was implemented consistently, with adequate quality and without significant interruptions). The hypothesis was proposed in Dolgikh (2020) [5] based on the analysis of early statistical data and the earlier results of immunology studies suggesting a possibility of such link in a number of human health conditions [11][12][13]. To the best of our knowledge, the exact mechanism of such a population-wide protection in the immune system has yet to be determined and will be addressed elsewhere; however, a logical possibility that such an induced general immunity protection may have an influence on the epidemiological scenario via providing some level of population-wide mitigation of negative Covid-19 impacts in the initial phases of the epidemics can be tested, in our view, with the publicly available epidemiological data.
To evaluate the statistical significance of the correlation hypothesis, we created groups of cases based on UBIP record. The selection of jurisdiction cases in the groups was blind, based only on the factors of UBIP policy for the jurisdiction and irrespective of the current Covid-19 impact statistics. Under the null hypothesis, the cases in different groups should have no significant correlation to the outcomes and therefore, have similar distribution; whereas detecting a significant variation in the outcomes between the groups defined by UBIP policy could place constraints on the null hypothesis. The detailed method of evaluation of the statistical significance of the correlation hypothesis is described in Section 5.

Data
A time-adjusted selected jurisdictions dataset was compiled from public sources with data of two groups of cases, Group 1 and Group 2 adjusted by the time of first exposure to Covid-19 as suggested by earlier studies [5]. Specifically, the dataset is comprised of the Group 1 cases with local arrival of the pandemics in January / early February 2020, recorded at the time point of TZ + 7 months and Group 2 cases with local arrival in February / March 2020 at TZ + 8 months, i.e. with approximately the same local exposure of six months. It is expected that by this stage, the epidemiological situation has developed to an expressed state in the analyzed jurisdictions.
A number of criteria were applied to selecting the cases in the dataset to minimize the uncertainty factors due to vast variation of conditions among the reporting jurisdictions worldwide:  A reasonable expectation of the accuracy, consistency and timeliness of the reporting from the national public health administration;  A reasonable level of exposure to Covid-19, e.g. certain minimum number of reported cases and / or impact;  A compatible level of social development and specifically, certain minimum standard of public health administration with respect to universal policy administration including importantly, the quality and the coverage.
The aim of the above criteria was to reduce the uncertainty related to the quality of administration of mass public health policy even when such has been declared.
Categories or bands of cases by the reported epidemiological impact were defined based on the logarithmic scale as follows: While the data for a number of smaller jurisdictions with population under 5 million was recorded in the dataset, they were not included in the statistical analysis due to higher probability of fluctuations related to unpredictable character of cluster development.
BCG universal vaccination record is described by the following bands as defined by Zwerling et al. (2011) [14] and summarized, with certain modifications, in Table 1.

Table 1. BCG universal immunization categories Category Description
A Ongoing universal or near-universal BCG vaccination program.

A2
Has a current UBIP with some limitations or qualifications (for example: a late start; inconsistencies in application practice, possible significant interruptions due to social factors and other).

B
Had a UBIP in the past covering significant part of population.

B2, B3
Immunization program was offered for a limited time interval or specific groups and cannot be qualified as universal over multiple age cohorts; immunization practice inconsistent with the hypothesis of early age induced immunity protection for example, delivered at an older age.
C Jurisdiction never had a universal BCG immunization program.

Notes and qualifications:
 Consistency and reliability of data reported by the national, regional and local health administrations;  Alignment in the time of reporting may be an issue due to reporting practices of jurisdictions;  Availability, consistency and reliability of historical data and statistics on the ad-ministration of immunization programs in the national, regional and so on, jurisdictions can be an issue.

Regional Variation Analysis
In certain jurisdictions significant regional variability in administration of UBIP can be noted, providing essential input relevant to the correlation hypothesis. In the analyzed cases many social parameters, such as living standard, age distribution, social traditions and practices were similar that can be expected to exclude or mitigate the influence of such factors and provide ground for a more confident conclusions of the correlation analysis.
Northern Europe: adjusted to the same time of local exposure, the four cases of Northern Europe show strong correlation between Covid-19 impact and the time of cessation of BCG UIP. These countries share similar levels of prosperity, lifestyle and traditions, climate that allows to eliminate many potential influencing factors. A similar pattern can be observed in the cases of Portugal (Group A) and Spain (Group B2), with the relative impact, at the time, of 0.064 and 0.22, respectively. These cases provide anecdotal but consistent over an extended period [5] support for the correlation hypothesis.

UBIP Cessation vs. Epidemiological Impact
In the group B, where a BCG immunization program existed but was ceased earlier, a strong correlation can be observed between the time of cessation of the UBIP and the severity of Covid-19 impact as shown in the diagram of Figure 2 (the data was time-adjusted to LTZ + 6 months).

Figure 2. Epidemiological impact vs. time past UBIP, Group B
The data in Figure 2 can be found in Table A2, Appendix I. It can be noted that most of the cases in this analysis had similar factors of demographics, levels of prosperity and overall quality of public health administration with the trend of correlation between the time of cessation of UBIP and the epidemiological impact of Covid-19 impact clearly observable in the diagram.

Heavy Onset Cases in the UBIP Group A
Several cases of rapid onset of Covid-19 disease were reported in countries with a current BCG UIP, including but probably not limited to, the following: Brazil; Mexico, India, South Africa, Russia, Iran and possibly, others. Without going into specific details of each case that can be done in another study, some general observations can be made here.

Years past UBIP
In the situations where universality and quality of administration of a UBIP policy in the daily practice could not be ascertained, large group of population may remain with limited or without effective protection even with a formally declared universal policy in place. In cases where these vulnerable groups would happen to be more exposed to the infection, it is possible to see higher impact of the epidemics. The factors such as levels of poverty; quality and access to public healthcare; a record of prolonged social disorder, wars, economic collapse and similar have a strong potential to compromise the administration of UBIP and with it, the hypothetical general protection effect. Most of the observed rapid onset cases in group A with a current UBIP fall into one of these categories, though certainly a more detailed analysis of these cases is warranted. Without a specific and detailed study of a jurisdiction it is not possible, in our view, to determine how essential and influential these factors can be and in this analysis we will limit ourselves to stating that given the generally available records for most if not all of the observed exceptions, they may not be found to be in a strong contradiction with the correlation hypothesis.
Delayed onset. An interesting observation that can be made about the cases in this group is ostensibly significantly delayed onset of the epidemics from the first introduction to the peak of the impact. Comparing the cases of the first arrival of the pandemics in Europe and Far East it can be observed that a development period of approximately 1.5 -2 months was common (for example, from end of January to March 2020, when the epidemics was in full development in the European jurisdictions [1,3,5]). This can be contrasted to the development period of four months and longer in many cases discussed in this section: Mexico, Brazil, India, South Africa, Russia and other similar cases. Whilst this observation at the time of writing may not have statistical significance, it can be seen as an indirect argument for the correlation hypothesis, as the possibility that the immunized part of the population could delay the transmission of the epidemics to the unprotected groups. This hypothesis requires further analysis and will be addressed elsewhere.

Possible Mechanisms
The hypothesis of induced early age immunity protection from the exposure to BCG proposed by Dolgikh (2020) [5] based on a number of reports pointing at a possible association between early delivery of BCG vaccine and a broad immunity against several conditions, including infectious [12,13]. It is further supported by a study indicating a possible mechanism for increased production of immune cells in infants following vaccination with BCG [11,23] as well as reported gender differences of Covid-19 [24]. A number of works looked at potential mechanisms for such an induced population-level hypothetical protection effect that can be associated with an exposure to BCG vaccine [25,26].
The research in this important area is ongoing and will be addressed in more detail elsewhere.

Statistical Significance of the Correlation Hypothesis
In this section the statistical significance analysis is performed at the time point of six months after the first local exposure to the infection. The analysis is based on evaluation of statistical parameters such as mean and standard deviation [27] of the epidemiological impact between groups of cases blindly selected based on the record of UBIP.
The null hypothesis in this case would dictate that immunization should carry no statistical significance for the epidemics impact, and therefore distributions in all of BCG group sample points (A, B, C) as defined above were described by a single distribution with, possibly, time-dependent parameters: mean epidemiological impact μ(t), standard deviation σ(t) that can be estimated from the overall dataset under the assumption of normal distribution.
The basis for the analysis that follows is the observation of a strong disparity between the sample means in groups A and C formalizing observations in Miller et al. (2020), Escobar et al. (2020) and Dolgikh (2020) studies [1,2,5]. Under the null hypothesis, these groups should be treated as the randomly drawn samples of a given size, for which distribution parameters can be estimated with the law of distribution of sample means [28]. For further detail on the composition of the groups refer to the Section A3, Appendix I.
Under the assumption of the null hypothesis all group samples would be drawn from the same distribution and the rule of sample means dictates that the means of the samples of groups A -C with the number of samples NG will be distributed with the same mean and a standard deviation σG as: where σ is the overall standard deviation of the dataset. From (1) based on the size of each group, one can estimate sample mean standard deviations for the groups A -C samples. For the selected groups of cases, statistical parameters of the epidemiological impact distribution measured in logarithmic mortality per capita mL(case, t) were obtained from the dataset (the Appendix I) as shown in Table 3. To satisfy the null hypothesis, the means of UBIP groups A -C would need to independently satisfy the normal distribution laws with the same mean μS that can be assumed to be equal to the overall dataset mean and σG defined by (1). Then one obtains: where the first term on the right is the probability of μA within the observed range below μS with a standard deviation σA and the second, similarly, of μC within the observed range above μS with a standard deviation σC. Then from (2) the p-value of the null hypothesis can be estimated as: excluding the null hypothesis at a confidence level of at least 10 -5 . The mean of the sample B was close to the dataset mean μs and for that reason did not contribute significantly to the p-value constraint.
This result can be illustrated by a histogram of the epidemiological impact for cases in UBIP groups A and C (Figure 4) in the analysis above.

Figure 4. Case histogram by epidemiological impact, UBIP groups A and C
The trend of the groups A (red) and C (blue) to the opposite ends of the impact range in logarithmic mortality per capita can be seen clearly, supporting the results of the statistical significance analysis.
To summarize the results of this section, if the group samples, selected blindly according to the record of UBIP had no correlation with the epidemiological impact and therefore, considered as independent random samples under the null hypothesis, repeated observations of sample means as far apart as in the analyzed groups would lead to strong constraints on the p-value of the null hypothesis.

Discussion
A confident determination of a correlation between universal BCG immunization and the type of Covid-19 epidemiological scenario in the national and subnational jurisdictions in the early phase of the development of the pandemics presents a serious challenge due to significant number of potential factors of influence as outlined earlier.
The analysis of observations of distribution of national and subnational cases by epidemiological impact supports the findings in a number of results indicating compatibility of observed data and statistical preference for the correlation hypothesis [3][4][5]. An original contribution of this work is the regional variation analysis and the analysis of past UBIP cases, Sections 3.1 and 3.2, which demonstrated a clear trend of observed epidemiological impact with respect to the time since cessation of UBIP in the early phase of the epidemics. This finding, observed in jurisdictions with similar socio-economic factors allowed to isolate significant number of possible factors of influence in a broader analysis and provided strong support for the correlation hypothesis.
Direct determination of statistical significance of the null hypothesis (Section 4) provided another strong argument in favor of the correlation. Indeed, under assumption of the null hypothesis, blind selection of cases based entirely on UBIP record and without any reference to the observed epidemiological scenario should have resulted in similar, excluding statistical fluctuations, values of statistical parameters among the groups, including mean and standard deviation. That assumption was found to be in a strong contradiction with the observed data and allowed to impose strong quantitative constraints (5) on the null hypothesis, in an agreement with a number of published results.
Overall, in the authors view, the arguments provided in this work strengthen the case for the correlation hypothesis in the early phases of the epidemics, before jurisdictional differences could impose measurable differences on the development of the epidemiological scenario. These observational and empirical findings are further supported by a connection to immunology studies (Section 3.4) pointing to a number of possible mechanisms for the origin of the effect of broad population-wide mitigation of Covid-19 and possibly, similar infections. This research is ongoing and has a potential to offer essential contributions to epidemiological science and policy [29].

Conclusions
The approaches in statistical analysis of the correlation hypothesis of a universal immunization program with Bacillus Calmette-Guérin vaccine and an early phase Covid-19 epidemiological scenario demonstrated in this work with an originally compiled time-adjusted dataset of national and subnational jurisdictions offer additional arguments in support of the correlation hypothesis, indicating a possibility of some form of general population-wide protection effect against Covid-19 resulting from universal immunization program with BCG vaccine, consistent with a number of earlier results. The findings add support to the rationale for further studies of the possible mechanisms of such general protection with potential benefits that may extend well beyond Covid-19 pandemics.
The results of this work in coordination with further research in this promising direction can be instrumental in development of effective responses and policies in the national public health care systems to minimize the impact of the infectious epidemics and protect the population against emergent epidemiological risks.
It is hoped that time-adjusted dataset compiled in this work as observations obtained with it can be useful to other researchers in the field looking for effective approaches to understanding and eventually, effectively managing and controlling this and similar infectious diseases in the future.
In conclusion, the authors would like to emphasize that the results reported in this study are related to the early, initial phases of the development of the Covid-19 epidemics in the studied cases, and cannot be assumed to be applicable in different stages without additional qualifications and / or constraints. For example, in the subsequent stages factors such as quality and effectiveness of epidemiological policy, state of the public healthcare system, availability of critical care resources and other can play more significant role. Choosing the optimal time interval for the statistical analysis in this work allowed to produce some essential insights into the early development of the pandemics.

Funding
The author received no financial support for the research, authorship, and/or publication of this article.

Acknowledgements
The author are grateful to the colleagues at the Department of Information Technology, National Aviation University for discussions of the methods and results of the study.

Ethical Approval
The study was conducted in accordance with the Declaration of Helsinki. The research uses exclusively publicly available information and does not contain identifiable information.

Data Availability Statement
The data presented in this study are available on request from the corresponding author.

Conflict of Interest
The author declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Table A1 describes the characteristics of the dataset of epidemiological cases at the time of approximately six months after the local arrival of the Covid-19 pandemics. The dataset is available from the corresponding author upon request.

A2. UBIP Groups
Notes and clarifications on selection of UBIP groups A -C for statistical significance analysis.
 Equivalent in Group A were the cases with a recent cessation of UBIP, with unprotected under the correlation hypothesis age cohort under 30 years of age.
 Equivalent in Group C were the cases with a very short or otherwise affected UBIP as further noted below.
 Group A selection: only cases with a reliable and consistent application of UBIP were selected. The case of Ireland was not added in this group due to documented records of inconsistent policy application across some subnational regions [30].
 The case of Spain was placed in Group C due to a very short duration of UBIP (less than 20 years overall) making it negligible for any hypothesized effect of population-wide protection. The same argument was applied in the case of Quebec, Canada.
 The case of United Kingdom placed in Group C due to incompatibility of the UBIP administration practice with the early age induced immunity hypothesis as it was administered at a school or early adolescence age [14].
 Canada, where no UBIP was provided except for a short duration in the province of Quebec [31] was represented by the cases of Ontario and Quebec, with the highest population and epidemiological impact (two subnational cases).
 Due to large population and high regional variation, United States was represented by three state cases California, Florida, New York, and the overall statistics for the country (three subnational cases).
 The case of France where UBIP was in place till 2007 was not included in any UBIP group due to uncertainty about the administration practice. Sources provide inconsistent information about the age of administration of BCG, in the infancy or school age: "BCG was mandatory for school children between 1950 and 2007" [14,32] indicating a possibility that in the least the practice was not consistent and the decision on the placement in the appropriate group could not be made without further detailed analysis.