Predicting Quality of Life Changes in Hemodialysis Patients Using Machine Learning: Generation of an Early Warning System

Objective To predict changes in the quality of life scores of hemodialysis patients for the coming month and the development of an early warning system using machine learning Methods It was a prospective cohort study (one-month duration) at the dialysis center of a tertiary care hospital in Pakistan. The study started on 1st October 2016. About 78 patients have been enrolled till now. Bachelor of Medicine and Bachelor of Surgery (MBBS) qualified doctors administered a proforma with demographics and the validated Urdu version of World Health Organization Quality Of Life-BREF (WHOQOL-BREF). It was to be repeated after one month to the same patient by the same investigator. Simple statistics were computed using SPSS version 24 (IBM Corp., Armonk, NY) while machine learning was performed using R (version 3.0) and Orange (version 3.1). Results Using machine learning algorithms, two models (classification tree and Naïve Bayes) were generated to predict an increase or decrease of 5% in a patient’s WHOQOL-BREF score over one month. The classification tree was selected as the most accurate model with an area under curve (AUC) of 83.3% (accuracy: 81.9%) for the prediction of 5% increase in QOL and an AUC of 76.2% (accuracy: 81.8%) for the prediction of 5% decrease in QOL over the coming month. The factors associated with an increase of QOL by 5% or more over the next month included younger age (<19 years) and higher iron sucrose doses (>278mg/month). Drops in psychological, physical, and social domain scores lead to a decrease of 5% or more in QOL scores over the following month. Conclusion An early warning system, dialysis data interpretation for algorithmic-prediction on quality of life (DIAL) was built for the early detection of deteriorating QOL scores in the hemodialysis population using machine learning algorithms. The model pointed out that working on psychological and environmental domains, in particular, may prevent the drop in QOL scores from occurring. DIAL, if implemented on a larger scale, is expected to help patients in terms of ensuring a better QOL and in reducing the financial burden in the long term.


Introduction
Dialysis patients usually have a long commitment to a certain lifestyle. This, in turn, has a significant impact on their quality of life (QOL), irrespective of the modality used [1]. Several factors, such as environmental, social, psychological, financial, and physical, play an important role in determining the QOL that an individual enjoys [1][2][3]. Several studies have been carried out worldwide with the purpose of identifying the most significant correlates with a better QOL [4][5]. Since there has been no study specifically aimed at the most important predictors of QOL in order of their strength of association using modern machine learning techniques, the purpose of this study is to produce an early warning system, dialysis data interpretation for algorithmic-prediction on quality of life (DIAL), using machine learning to predict a change in QOL in a hemodialysis patient over the coming month. This will be helpful in directing resources toward the high-risk population group.

Materials And Methods
This was a prospective cohort study (of six months' duration) at the hemodialysis unit of a tertiary care center in Pakistan. It included all the consenting patients who are more than 15 years of age, diagnosed with end-stage renal disease (ESRD) for more than a year, have been on a certain hemodialysis regimen (twice or thrice weekly) for at least three months, and don't have any disability in communication. All those who did not fulfill the inclusion criteria, patients with a known psychological disorder, patients admitted to critical care units, and patients who had recently (within the last three months) switched from one hemodialysis regimen to the other were excluded from the study. Patients were included by non-probability convenience sampling. Permission for commencement was taken from the local ethics committee. The study started on 1 st October 2016.
A total of 78 patients were enrolled. An MBBS qualified doctor administered a proforma with demographic questions and the validated Urdu version of World Health Organization Quality of Life-BREF (WHOQOL-BREF) by Khan MN et al. [6]. WHOQOL-BREF Urdu has already been validated for the hemodialysis population in Pakistan; thus, it was a fitting choice for QOL assessment. WHOQOL-BREF Urdu has 26 questions. Question one asks about an individuals' overall perception of QOL and question two is about the overall perception of health. The remaining questions pertain to four major domains of life, i.e., physical health, psychological health, social relationships, and the environment. All domains have different raw score ranges; for uniformity, all raw scores were transformed to the 4-20 range according to WHO guidelines. Higher scores show a better QOL. Scores from all the four domains were later combined into one final QOL score. The questionnaire was administered at the start of the study on day zero, then repeated after one month to the same patient by the same investigator. The outcome variable was the amount of change in the total QOL score (delta QOL) over the coming month. The predictor variables were age, gender, income per month, iron sucrose dose per month, and total QOL score at the beginning of the study. Other variables as predictors included changes over the coming month for individual domain scores, hemoglobin, and serum albumin. A first interim analysis was performed on 15 th January 2017. Based on the results obtained from the first interim analysis, the foundations of an early warning system, dialysis data interpretation for algorithmic-prediction of quality of life (DIAL), were also laid. DIAL's sole purpose is to make automated monthly data collection of QOL scores and other predictor variables. DIAL is currently in the implementation phase and its impact on the improvement of the clinical and financial aspects of QOL in dialysis patients will be assessed at a later date after the data is collected. Descriptive statistics in the current study were done using SPSS version 24 (IBM Corp., Armonk, NY). Mean and standard deviations were used to describe continuous variables like age and QOL scores, while percentages and frequencies were used to describe categorical variables. Machine learning was performed using R (version 3.0) and Orange (version 3.1) [7].

Results
A total of 78 patients were included in the interim analysis. The mean age in years was 51.00 (SD=20). Males comprised 53.8% (42/78) of the total population. The mean duration of hemodialysis was 41.40 months (SD=28.90). The mean albumin levels at the start and end of the one-month period were 3.61 g/dl (SD=0.52) and 3.63 g/dl (SD=0.53), respectively. The means of the total QOL scores at the beginning and end of the one-month study period were 57.6 (SD=10.33) and 59.3 (SD=10.24), respectively, as seen in Table 1   The model showed monthly income (p<0.000) and serum albumin (p<0.000) to be positively and significantly associated with better QOL, as shown in Table 3. 2017   Using machine learning algorithms (Figure 1), two models (classification tree and Naïve Bayes) were generated to predict an increase or decrease of 5% in a patient's WHOQOL-BREF score over one month. The classification tree was selected as the most accurate model with an area under curve (AUC) of 83.3% (accuracy: 81.9%) for the prediction of 5% increase in QOL and an AUC of 76.2% (accuracy: 81.8%) for the prediction of 5% decrease in QOL over the coming month. The factors that were associated with an increase in the QOL score by 5% over the next month were a positive change in domain four (environmental), a total QOL score of <65 at the beginning of the cohort study, age less than 19 years, and iron sucrose doses >278mg/month. The factors associated with a decrease of 5% (Figure 2) in the QOL score over the following month included a decrease in domains two (psychological), one (physical), and three (social), and a greater than 61 total QOL score at the start of the cohort study in order of their importance.

Discussion
Hemodialysis patients represent a special set of population. After hearing the diagnosis of endstage renal disease (ESRD), many patients undergo some level of depression [8]. There are physical, social, and psychological impacts on their life, which are reflected in their overall QOL [9][10]. There has always been a need to identify patients at high risk of dropping QOL scores and working on specific domains to aid recovery.
In a tertiary care center of Islamabad, using the WHOQOL-BREF Urdu questionnaire, we collected data regarding the most significant factors that might influence QOL scores in hemodialysis populations. Using modern machine learning methods, we succeeded in building a prediction model that can forecast a change in QOL score in either direction, one month in advance. There have been many studies showing the factors associated with changes in QOL [11]. One of the earlier studies performed in the local population showed that unemployment and psychiatric disease were independently and significantly associated with lower scores of QOL in the dialysis population [12]. To our knowledge, this is the first instance of using modern data analytic techniques to this problem. There is no example of generating an early warning system like DIAL, which may be used as a monthly surveillance system, in the long run, to assess and shortlist patients with the highest risk of having a drop in QOL scores in the coming month. The implementation of such a system is present in other fields [13] and is found to have significant and positive impacts on the financial and clinical aspects of patient management [14].
We also found that domain four (environmental domain) was positively associated with better QOL scores. This is consistent with some earlier studies as well [15]. In an earlier study, age was also found to be significantly associated with good QOL scores [16]. This is also evident in our study. Higher doses of iron sucrose have been given in dialysis patients to replete iron stores [17]. In our study, higher iron replacement doses (>278 mg per month) were found to be associated with better QOL scores. Since the maximum dose given was 800 mg per month intravenously, we could not ascertain whether doses greater than 800 mg are associated with a negative change in QOL scores or not. Other studies have found optimum iron replacement doses in terms of clinical improvement in hemoglobin and, thus, indirectly improving clinical symptoms [18].
Among the patients who suffered a negative change in QOL scores, changes in the psychological, physical, and social domains were the most important contributors. Older patients, but younger than 60 years of age, were more prone to a negative change in QOL scores over the coming month. This may be because these middle-aged patients feel limited and restricted earlier in their life due to dialysis, leading to a greater burden of psychological problems when compared to older (>60) and younger (<30) populations. The relation of increasing age with QOL scores has already been shown in an earlier study [16]. Our study also showed that males had better overall QOL and psychological domain scores when compared to females. The findings were statistically significant but unadjusted for other covariates.
There were also a few limitations in our study. It was an observational study conducted on the local Pakistani population. Also, most of the questions asked regarding QOL were subjective measures of one's own perception. Since we used the validated questionnaire for our population in their own native language, this factor is expectedly addressed to the maximum possibility. Also, this is an interim report on the ongoing project, which is expected to be completed at the end of 2019. Some of the candidate covariates that are not assessed in the interim analysis but will be used in the final report include serum iron, total iron binding capacity (TIBC), usage of any supplementary medicine/multivitamins, diet regimens, history of receiving any psychological or physical therapies, and so on. Despite the smaller sample size and convenience sampling, the significant values of AUC and a high accuracy suggest a very stable and highly dependable prediction and surveillance system. This leads our team to move on to the publication of the interim results.