Classification of Brain Metastases Prognostic Groups Utilizing Artificial Neural Network Approaches

Objective: The purpose of this investigation is to explore the performance of an artificial neural network (ANN) based prognostic index compared to traditional logistic regression (LR) modeling and other published prognostic indices (PI) in classifying survival among patients with brain metastases treated with stereotactic radiotherapy. Methods: A database of 460 patients having received either stereotactic radiosurgery or fractionated stereotactic radiation therapy brain radiotherapy was utilized and divided into three sub-databases for ANN/LR analysis: a testing dataset, n=276 (65%); a cross-validation dataset for training, n=69 (15%); and a validation dataset, n=115 (25%). The primary endpoint of survival was classified into one of three categories: unfavorable survival (<two months), intermediate survival (two to six months) and favorable survival (>six months) endpoint classifications. ANNs were optimized in terms of model structure, complexity, and a cost optimization algorithm and then compared to both LR and published PIs in terms of classification accuracy (CA) and total major misclassification rates (TMMR) according to the three category survival scheme. Results: CA and TMMR for the nine published PIs for the total database (n=460) ranged from 34-53% and 4-11%, respectively. Both the LR and ANN approaches (in the validation database) were over 10% superior to the best existing PI system in terms of CA (LR/ANN 62.6%, published prognostic indices 27-49%) with a similar rate of TMMR (LR 7.8%, ANN 6.1%, published prognostic indices 2-17%). Conclusions: While a modest improvement over published PI was noted, use of various ANN model structures, nodal complexity, and cost function optimization algorithms did not lead to a significant improvement in survival classification when compared to LR.


Introduction
Validated prognostic factors and indices can be used to assist the clinician in patient counseling and treatment decision-making.Additionally, such indices can support the conduct of prospective clinical trials by defining patient eligibility and stratification criteria.Multiple prognostic factors have been shown to be related to patient survival in the context of brain metastases which include: performance status, extracranial disease, age, controlled primary, primary site, interval between primary disease and brain metastases, number/volume of brain metastases, and clinical response to steroids [1][2][3][4][5][6][7][8][9][10][11].The Radiation Therapy Oncology Group (RTOG) Recursive Partitioning Analysis (RPA) brain metastases prognostic index is the oldest system currently in use [2,[12][13][14][15][16].However, the utility of the system has been limited by the large relative proportion of patients within the intermediate-risk group, as has been previously highlighted by several investigators [17][18].
Other systems have been subsequently developed using different combinations of the previously listed prognostic factors.These include the Score Index for Radiosurgery -SIR [4], Rotterdam scale -RDAM [3], Basic Score for Brain Metastases -BSBM [5], Golden Grading System -GGS [8], Graded Prognostic Assessment -GPA [6,10], Disease Specific GPA -DS-GPA [9] and the German I and II scales developed by Rades et al. [7,11].A recent systematic review of all published systems was not able to definitively identify a superior system [1]; however, a recent neural network analysis suggested that the newly developed RTOG Graded Prognostic Assessment (GPA) system may have some advantage in prognostic utility in the context of WBRT patient populations [19].
Creation and assessment of prognostic factors and indices traditionally involves the use of Cox proportional hazards and logistic regression modeling approaches for survival time and categorical event endpoints, respectively.Artificial neural networks (ANN) consist of a set of multivariable approaches that mimic networks of biological neurons [20].Layers of nodes (input, hidden, and output) are interconnected with weighed connection lines (connecting all nodes from one layer to another) in order to form the non-linear computational structure of the ANN (Figure 1).During ANN training with patient data (a training dataset that includes prognostic factors and study endpoints), the initial weights of the connection lines that input into the various nodes are altered using one of many available cost-function optimization procedures against a separate cross-validation dataset.Once training is completed, the trained ANN can be assessed using a third independent validation dataset in order to report on ANN prognostic ability (Figure 2).ANN approaches have three main advantages over other techniques including: [1] not requiring an assumption of proportional hazard risk to baseline, [2] utilization of non-linear (and non-parametric) associations, and [3] all interactions between input factors and hidden layers are allowed (i.e.model

Source databases
A retrospective review was performed on two institutional databases of 500 patients diagnosed with oligometastatic brain metastatic disease.Patients received either stereotactic radiosurgery (SRS, n=381 with one to three brain metastases) or fractionated stereotactic radiation therapy (fSRT, n=119, one to six brain metastases) between 2002 and 2011.This database contained pretreatment information (including derived risk stratification categories for all nine published systems), treatment details, and outcome information, including the primary endpoint of overall survival.Patients were treated at one of two cancer sites: London Regional Cancer Program (LRCP, London, ON, n=69 fSRT patients) or at VU Medical Centre (VUMC, Amsterdam, The Netherlands, n=381 SRS patients, plus n=50 fSRT patients).Institutional ethics approval was obtained for this joint database analysis.
Full treatment details for both SRS and fSRT approaches have been extensively published in the medical literature [24][25][26][27].Pooling of fSRT and SRS into a joint database for this investigation was performed after confirming that treatment assignment (fSRT vs. SRS) was not a significant predictor for overall survival.This analysis was performed in the context of a propensity-score matched pair analysis (accepted for publication in Radiotherapy and Oncology).

Endpoint definitions and ANN study databases
The source databases described above were merged into a common database for further statistical analyses.Individual survival time was recoded into one of three discrete categories, as suggested by Nieder, et al. [18], as no other internationally recognized prognostic categories exist in the medical literature.These three categories consisted of: unfavorable survival (< two months); intermediate survival (two to six months); and favorable survival (> six months) endpoint classifications.In order to avoid the issue of incomplete followup of censored patients with less than six months of follow-up, all patients (whether censored or not) treated in the six months prior to the final database update were removed from the database (n=460 patients remaining).These remaining patients were then randomly assigned into one of three study databases (testing dataset n=276 (65%), cross-validation dataset for ANN training n=69 (15%), and an ANN validation dataset n=115 (25%)).

Control analyses
Descriptive and operating characteristic (OC) statistics (Figure 3) were calculated for each of the published prognostic indices for each of the three databases (testing, cross-validation, and validation) and the complete study database (n=460).Most prognostic indices could be calculated for all study patients, except for the DS-GPA 412/460 (90%) and the RDAM 362/460 (79%) due to missing information from the database (non DS-GPA tumor sites and steroid response information).Missing information was found to be distributed relatively equally between all study databases.The first calculated OC statistic was prognostic index accuracy (rate of correct classification equals high-risk category predicting survival < two months, plus intermediate-risk category/categories predicting survival two to six months, plus low-risk classification predicting survival > six months, Figure 3).The second OC statistic was the total major misclassification rate defined as proportion of all patients that are misclassified into the opposite survival group (i.e.high-risk category patients surviving greater than six months, plus low-risk category patients surviving less than two months, Figure 3).

FIGURE 3: Confusion Matrix for Calculation of Classification Accuracy and Total Major Misclassification Rate
A backward elimination logistic regression (LR) analysis using the constituent prognostic factors of the published prognostic indices (primary class/site, presence of systematic metastases, performance status, age, interval between primary diagnosis and brain metastases presentation, volume/number brain metastases, and active primary) as input variables to predict for categorical survival group (favorable, intermediate, and unfavorable survival endpoint groups) was performed using NeuroSolutions 6.1 modeling software (NeuroDimension Inc., Gainesville FL, USA).Both classification accuracy and total major misclassification rate for the logistic regression were calculated for all three study databases (testing, crossvalidation, and validation) and for the total study database.

Artificial neural network analyses
Artificial Neural Network (ANN) analyses were performed using the Express Builder functionality of the NeuroSolutions software in which multiple ANN constructs can be assessed in parallel to identify superior ANN approaches.ANN analysis differs from traditional regression approaches given the fact that multiple ANN structures and optimization approaches need to be explored in order find the ideal ANN solution.All steps described herein will utilize the classification accuracy metric to adjudicate superior ANN (or LR) strategies.Receiver operator curves were considered not to be appropriate for this analysis given the classification of a three-category endpoint.The investigation of ANN approaches was defined as an a-priori sequence of steps, which are described below: Step 1 -ANN Structure Assessment: Three unique ANN structures (One-layer Perceptron, Two-layer Perceptron, and Probabilistic Neural Network) were assessed against the LR approach to assess the ideal ANN structure for further node structure and cost optimization algorithm.These three structures were assessed due to their potential utility in the assessment of smaller datasets (<1000 data-points).In addition to the more familiar perceptron ANN approaches, the Probabilistic Neural Network (PNN) approach was also used given the known advantage of efficient training of smaller datasets.The one-layer perceptron utilized nine input nodes, five hidden nodes (to avoid over-fitting), and one output node with Levenberg-Marquardt (LM) optimization (MLP1-951-LM).The two-layer perceptron utilized nine input nodes, six and then three hidden nodes, and one output node with LM optimization (MLP2-9631-LM).The PNN used a knowledge map approach with nine input nodes, 235 internal nodes, and one output node.Perceptron-based and PNN optimization was limited to 100 iterations and three epochs, respectively.
Step 2 -ANN Node Alteration: After selection of the best performing Step 1 ANN based on the classification accuracy of all ANNs tested, the nodal structure of the ANN in question was altered in order to explore whether or not the addition or subtraction of nodes improve the classification accuracy of the ANN approach.For perceptron-based ANNs, the number of hidden nodes was changed to plus or minus two nodes (i.e. if five hidden nodes are used in step one and found to be ideal, a range of hidden nodes from three to seven will be tested in step two).
Step 3 -Cost Optimization Comparison: A series of available cost optimization algorithms were used to assess whether or not the choice of algorithm used can impact model performance.The following algorithms were assessed: Levenberg-Marquardt (LM), Conjugate Gradient (CG), Delta Bar Delta (DBD), Quickprop (QUICK), Step (STEP), and Momentum (MOM).
Step 4 -Final Optimization: This final step will be utilized if the ANN approach is found to be superior to the baseline LR approach.The final ANN will be subjected to both a rotating 10-fold cross validation to assess average ANN classification accuracy.If the ANN approach is not found to be superior to the baseline LR approach, then the step four analyses will not be performed.

Database composition
Descriptive statistics for the total study database (n=460) as well as the three investigative databases (testing, cross-validation, and validation) are summarized in Supplementary Table

Traditional prognostic index assessment
Classification accuracy (CA) and total major misclassification rates (TMMR) for all nine published prognostic indices are summarized in Table 3 for all study databases.In terms of the total database, classification accuracy ranged from 34% (DS-GPA) to 53% (RADES II).Total major misclassification rates ranged from 4% (RDAM and BSBM) to 11% (GPA).

Artificial neural network assessment
The CA and TMMR (for the testing, cross-validation, and validation) metrics for all three steps of the ANN assessment is summarized in Table 4.The logistic regression model was found to have a CA and TMMR of 62.6% and 7.8%, respectively.The first step of the ANN optimization demonstrated that the one-layer perceptron ANN (MLP1-951-LM) had an identical CA (to the LR approach) of 62.6% with a slightly improved TMMR of 6.1%.Optimization procedures in step two (variable hidden nodes) and step three (cost optimization algorithm) did not further improve the CA and TMMR rates compared to LR.
Step four was not conducted due to the fact that the final ANN approach (MLP1-951-LM) did not demonstrate a superior CA rate to the traditional LR approach.Both the LR and ANN approaches (in the validation database) were over 10% superior to the best existing prognostic system in terms of CA (LR/ANN 62.6%, published prognostic indices 27-49%) with a similar rate of TMMR (LR 7.8%, ANN 6.1%, published prognostic indices 2-17%).

Discussion
Classification of brain metastases patients into different prognostic groups is an important clinical and research endeavor due to the potential impact to patient treatment and clinical trial design.No superior system has been described in the literature; however, the RTOG RPA system is the most commonly used index and has previously undergone extensive validation.The development of these prognostic indices were primarily driven with statistical methodologies that attempted to identify relatively small patient groups (i.e.good and poor prognosis) with extreme survival characteristics leaving the majority of patients in intermediate categories.No prognostic indices have been developed with a-priori defined clinically relevant survival categories; however, these categories have been recently suggested in the medical literature and were used in this report [18].
Our investigation was not able to identify an ANN approach with superior CA to a more traditional LR statistical approach.This finding of equivalent utility between ANN and non-ANN approaches has been previously been observed by Sargent et al. [20].In this systematic review of 28 ANN vs. LR/Cox regression analyses of larger datasets (>200 patients) within the medical literature, half of the comparisons that were reviewed demonstrated equivalence between ANN and LR/Cox regression analyses.This effect becomes exaggerated as N becomes large (i.e.> 5000 patients) where 87.5% of comparisons were equivalent between ANN and LR/Cox.Therefore, this analysis re-confirms the fact that using complex modeling does not necessarily lead to improvements in predictive power over more traditional techniques, such as LR and Cox proportional hazards regression analyses.
Both ANN and LR approaches were superior to published prognostic indices in terms of CA in the order of 10% or greater.This is likely due to the fact that both the ANN and LR approaches were specifically optimized (using the cross-validation dataset) to categorize patients into one of three distinct prognostic groups (unfavorable < two months; intermediate two to six months; favorable > six months).None of the published prognostic systems were similarly optimized leading to inferior CA rates when compared to ANN and LR.The ANN approach demonstrated a small TMMR difference of less than 2% compared to the baseline LR statistical approach; however, both models had TMMR similar to published prognostic indices.Given this small difference in TMMR performance, which was similar to other indices in terms of absolute TMMR rates, an effective argument to adopt a complex ANN approach for patient categorization cannot be effectively made.If internationally accepted categories, either identical or similar to those proposed in this manuscript, can be generally agreed upon and adopted, further research can be performed to create a new ideal prognostic index or to further refine existing scales.We would recommend the initial use of an LR approach to predict survival categories, as this investigation did not find evidence of complex interactions between prognostic variables requiring non-linear statistical techniques that are used in ANN optimization.
This investigation has several limitations relating to the database utilized that includes: a SRS/fSRT patient population with no external validation dataset.Also, the possibility of superior ANN optimization approaches and model structures need to be acknowledged.Future investigation in this area should include: integration of novel prognostic factors (e.g.genetic analysis or imaging-based parameters) into existing or new prognostic indices, survival classification standardization to drive future research and clinical care, and investigations into broader patient populations (e.g.WBRT, neurosurgery) to assess study generalizability.Additionally, the prediction of long-term survival (>one year) needs to be explored further using appropriate patient dataset(s) with sufficient median survival and statistical power to draw robust conclusions.

Conclusions
The use of various ANN model structures, nodal complexity, and cost function optimization algorithms did not lead to a significant improvement in survival classification when compared to a LR approach.Both ANN and LR approaches were superior in terms of CA but not TMMR when compared to traditional brain metastases prognostic indices.

TABLE 1 : Descriptive Statistics for the Three Study Databases and the Total Database.
In terms of the total database, the majority of patients were treated with SRS (78.7%).Mean age was 61.4 years with a median primary tumor to brain metastases diagnostic time interval of 293 days.Over 97% of patients had a World Health Organization performance status level of 0-2.Median number of brain metastases was two (range: one to six) with a mean total volume of 8.35 cc (range 0.03-151.5cc).Primary cancer was lung in 57.2% of cases with breast cancer being the second most commonly diagnosed cancer in 10.4%.The primary cancer was active at the time of brain metastases treatment in 52% of cases and other systemic metastases were present in 50.9%.Comparison of the total database with each of the smaller investigative databases did not highlight any significant difference between the groups.Prognostic index classification for all published indices is summarized in Supplementary Table2.2013Rodrigues et al.Cureus 5(5): e115.DOI 10.7759/cureus.1156 of 11