Assessing the National Prevalence of HIV Screening in the United States using Electronic Health Record Data

The Centers for Disease Control and Prevention and the U.S. Preventive Services Task Force recommend population-based screening for human immunodeficiency virus (HIV) at least once in each patient's life. National surveys estimate that 42.5% of the population has been screened; however, these studies have relatively low sample sizes and inherent survey biases. Using a national, de-identified cloud-based electronic health record (EHR) information from over 48 million patients, we found that only 6.4% of Americans over the age of 18 had laboratory evidence of a prior HIV test. Further investigation is necessary to determine if single-item questions on national surveys correlate with objective evidence of HIV testing, as well as addressing the numerous limitations related to the use of EHR data that likely grossly underestimates the prevalence of HIV screening nationally.


Introduction
Population-based screening for human immunodeficiency virus (HIV) is recommended by both the Centers for Disease Control and Prevention (CDC) and the United States Preventive Services Task Force [1][2]. The CDC estimates 42.5% of the US population of 18 years of age and older has been screened for HIV [3]. National, question-based surveys provide data for this prevalence estimate [4]. We sought to estimate the prevalence of HIV screening in the United States using laboratory data from real-time Electronic Health Record (EHR) data of over 60 million unique patients over 18 years.

Materials And Methods
We utilized the cloud-based Explorys, Inc. (Cleveland, OH) database. De-identified and standardized aggregate data from 60 million patients are uploaded daily to Explorys from 26 integrated-US healthcare systems across all 50 states. An in-depth description of the methodology and technical features of Explorys has been previously described in the literature [5], and has been validated across numerous fields, including dermatology, endocrinology, neurology, gynecology, gastroenterology, orthopedics, surgery, and hematology [6][7][8][9][10][11][12][13]. Briefly, data from EHRs is mapped onto the unified medical language system (UMLS) that is standardized and normalized, namely, the Systematized Nomenclature of Medical Clinic Terms for clinical term (SNOMED-CT) hierarchies, allowing researchers to utilize the web application's PopEx system to search for disease, procedures, and laboratory results at the epidemiological level of a de-identified, aggregate patient cohort. SNOMED-CT is akin to the Clinical Classification Software (CCS) codes used to analyze data from the Agency for Healthcare Research and Quality. Use of Explorys has been deemed exempt from institutional review board approval by University Hospitals Cleveland Medical Center.  [14].
We compared this data to the entire Explorys population over the age of 18 years with no previous history of HIV infection. Demographic data are presented as numbers and percentages. Prevalence of HIV screening was age-adjusted for sex comparisons, sex-adjusted for age-group comparisons, and sex-and age-adjusted for race comparisons. χ2 Tests and multiple pairwise comparisons with Bonferroni correction were used to assess differences between groups. Logistic regression was performed to model the effect of age, sex, race, and insurance status on HIV screening. The two onesided t test (TOST) with +/-10 point margin was used to assess the equivalence of group prevalence estimates between Explorys and CDC data as recommended by Tatem et al. [15]. Statistical significance was set to p < 0.05. All analyses were performed in either IBM SPSS Statistics, Version 25 (IBM) or Microsoft Excel, Version 16.11.1 with χLSTAT software for equivalence testing.  f "Other" races not included in this study.

Discussion
We sought to eliminate biases associated with survey questions by characterizing the prevalence of HIV screening in the United States using laboratory data from one of the largest, nationally distributed patient population databases. A preliminary analysis identified the prevalence of people living with HIV in Explorys to be 0.33%, which approximates the CDC reported a prevalence of 0.37% [16]. However, in this study, we identified only 6.4% of the Explorys population as ever-screened for HIV compared to 42.5% estimated by the CDC. Equivalence testing was non-significant indicating these databases are not equivalent for estimating HIV screening. While our estimates were significantly lower than those reported by the CDC, it should be noted that no study has yet determined whether single-item questions on national surveys of HIV screening corroborate with objective evidence of screening. Regardless, females, African Americans, and persons under 40 were more likely to be screened in the Explorys population, which corroborates the demographic distribution of screening observed in national surveys [3].
This study's limitations are important and relevant to the use of EHR data for HIV screening. First, this study was limited by hospitals systems that do not report HIV laboratory data to Explorys or use anonymous HIV screening. Second, information from patients who receive screening through nonhospital systems, such as county health departments, stand-alone STD clinics, are not included in the Explorys database. Third, the conversion from paper charts to EHRs for the included hospital systems may result in missing data since 1999; however, routine screening for HIV was not recommended by the CDC until 2006 and by the USPSTF until 2013, thus providing a lag time for routine uptake of this clinical practice as hospitals adopt the EHR. These limitations taken together suggest that use of the EHR for assessing the prevalence of HIV screening in the United States should not currently be utilized in health services research and are likely the reason for the significant discrepancy observed in this study.

Conclusions
It is estimated that the prevalence of population-based HIV screening in the United States relies on survey questions that may not be reliable and often represents data from the previous year or later. Although this study reveals the profound limitations with EHR data thus rendering it currently not useful for the study of HIV laboratory data, if these limitations can be addressed nationally, cloudbased all-payer databases may provide objective, daily up-to-date information on HIV screening daily. Until then, studies using EHR or administrative claims data should interpret HIV laboratory data with caution as it may greatly underestimate the proportion of patients screened in the United States.

Additional Information Disclosures
Human subjects: All authors have confirmed that this study did not involve human participants or tissue. Animal subjects: All authors have confirmed that this study did not involve animal subjects or tissue. Conflicts of interest: In compliance with the ICMJE uniform disclosure form, all authors declare the following: Payment/services info: All authors have declared that no financial support was received from any organization for the submitted work. Financial relationships: All authors have declared that they have no financial relationships at present or within the previous three years with any organizations that might have an interest in the submitted work. Other relationships: All authors have declared that there are no other relationships or activities that could appear to have influenced the submitted work.