A Chronological Overview of Using Deep Learning for Leukemia Detection: A Scoping Review

Leukemia is a rare but fatal cancer of the blood. This cancer arises from abnormal bone marrow cells and requires prompt diagnosis for effective treatment and positive patient prognosis. Traditional diagnostic methods (e.g., microscopy, flow cytometry, and biopsy) pose challenges in both accuracy and time, demanding an inquisition on the development and use of deep learning (DL) models, such as convolutional neural networks (CNN), which could allow for a faster and more exact diagnosis. Using specific, objective criteria, DL might hold promise as a tool for physicians to diagnose leukemia. The purpose of this review was to report the relevant available published literature on using DL to diagnose leukemia. Using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, articles published between 2010 and 2023 were searched using Embase, Ovid MEDLINE, and Web of Science, searching the terms “leukemia” AND “deep learning” or “artificial neural network” OR “neural network” AND “diagnosis” OR “detection.” After screening retrieved articles using pre-determined eligibility criteria, 20 articles were included in the final review and reported chronologically due to the nascent nature of the phenomenon. The initial studies laid the groundwork for subsequent innovations, illustrating the transition from specialized methods to more generalized approaches capitalizing on DL technologies for leukemia detection. This summary of recent DL models revealed a paradigm shift toward integrated architectures, resulting in notable enhancements in accuracy and efficiency. The continuous refinement of models and techniques, coupled with an emphasis on simplicity and efficiency, positions DL as a promising tool for leukemia detection. With the help of these neural networks, leukemia detection could be hastened, allowing for an improved long-term outlook and prognosis. Further research is warranted using real-life scenarios to confirm the suggested transformative effects DL models could have on leukemia diagnosis.


Introduction And Background Leukemia
Leukemia, a cancer of the blood cells, results in the abnormal generation of white blood cells (WBC) in the body's bone marrow and hinders the development of other blood components such as platelets and red blood cells.This results in a progressive, possibly fatal medical condition that requires both timely and accurate diagnosis for effective treatment and patient prognosis [1].Leukemia is diagnosed by analyzing peripheral blood, typically under microscopy or flow cytometry.Microscopy of a blood smear allows physicians to visualize the morphological changes in blood cells associated with leukemia.However, these morphological changes are often difficult to uncover by the human eye, due to the high density of cells to sift through.Flow cytometry has a relatively fast turnaround time but requires fresh blood draws and cannot analyze the histomorphology of the blood samples [2].Moreover, not all people with leukemia will have disease presentation in the peripheral blood.To circumvent this, an invasive bone marrow biopsy can be performed.This process has a longer turnaround time of one to two weeks but is more diagnostic, often revealing the typical hypercellular bone marrow with a drop in normal hematopoietic cells [2].The realistic possibility of human error, the need for human input and fresh blood samples, long turnaround times, and uncomfortable procedures, raise the question of how these diagnostic processes can be streamlined to improve diagnostic rates and subsequent prognostic outcomes.

Deep learning models
Deep neural networks (DNNs), often termed deep learning (DL), are a subset of artificial neural networks inspired by the function of the human brain.DNNs are composed of interconnected artificial neurons organized into layers.Neurons in one layer are connected to neurons in the layers around it, which form a network of connections.They have revolutionized various fields, including machine learning and artificial intelligence, by enabling the development of models for tasks such as image recognition, natural language processing, and reinforcement learning [3].
Convolutional neural networks (CNNs) are specialized DNNs for completing tasks like image and video processing while recurrent neural networks (RNNs) are designed for sequential data, such as natural language text.DNNs learn from data through a process called training.During training, the network adjusts its internal parameters to minimize the difference between its predictions and the actual target values in the training data.Pre-trained DNNs have widespread medical applications since they can be fed large image datasets to produce new biomedical informatics and associations [4,5].Yet, building these networks involves a resource-intensive and lengthy process, dependent on the quality of the input data.

Deep learning and leukemia
With the rapid advancement of DL, there has been a growing interest in leveraging these technologies for the early detection of medical conditions, which is expected to grow exponentially [6].DNNs aid physicians in accurately diagnosing leukemia through feature analysis such as image classification, object detection, image retrieval, semantic segmentation, and human pose estimation [3,[7][8][9].Leukemia detection involves analyzing bone marrow smears and images to identify certain pathological features of leukemic cells while comparing them to healthy ones.Matek et al. used CNN to identify malignant WBC hematologic malignancies for one subtype of leukemia [10].Using a dataset of 18,000 images, the system recognized the most common physiological cell type -myeloblasts -with an accuracy above 90%.Shafique et al. developed a CNN to analyze a subset of leukemia based on cell size and nucleus, finding 99% accuracy when comparing malignant cells with healthy ones [11].Thanh et al. made a unique five-layer CNN that also found high accuracy in classifying a special subset of leukemia [12].
Researchers have used DNNs to predict the risk of leukemia development based on genetic factors.Several research groups have used these networks to predict mutations in nucleophosmin 1 (NPM1; a pathognomonic mutation for leukemia) [13][14][15].Eckardt et al. developed a DL model capable of predicting NPM1 mutation status from bone marrow cytomorphology, yielding 86% accuracy [16].Their model also completed cell segmentation and image classification to differentiate healthy cells from leukemic cells.This DL model had an accuracy of 91% when discerning a leukemia subtype cell morphology from healthy bone marrow donor samples [16].
DNNs require a large, labeled database to be trained to identify unique characteristics of the data.Ahmed et al. used data augmentation to increase the image database artificially to assist in this process [17].Their CNN yielded an 88% accuracy for the classification of one leukemia type, and an 81% accuracy to classify classification of all leukemia subtypes [17].Another barrier noted with the development of DNNs is the standardization of color in the images [4,18].Saraswat et al. found a method that excludes the unwanted noise from non-standardized colors in staining called a deconvolution-based method, which performed better than simple color transfer methods [19].This method allows the DNNs to analyze stain concentration and absorbance.Other researchers used an image pre-processing technique that adapted image color space to separate the intensity channel from hue and saturation, allowing stain concentrations and absorbance to be analyzed [20].Another study found that when compared to non-standardized images, DNNs that used images standardized in red, blue, and green colors, had a 98% accuracy [11].Together, these studies have found that pre-processing techniques are promising when maximizing the DNN's ability to differentiate leukemic cells versus healthy cells.
Image pre-processing, data augmentation, and healthy versus malignant cells accuracy demonstrate the different ways researchers have used to improve DLs.The high accuracy for predicting the diagnosis of individual leukemia subtypes could open the door to an unexplored world where just one blood marrow smear/image could diagnose/differentiate from healthy cells, leukemia, and its subtypes.Also, the current procedures for leukemia detection have some drawbacks that could be alleviated by DLs, making the process more efficient, accurate, cost-effective, and reliable.

Eligibility Criteria
The inclusion for this review encompassed both experimental and nonexperimental studies, full-text articles, articles in the English language, and articles published between 2010 and 2023.The review focused on investigating DL technologies to diagnose leukemia using selected studies that used DL and its subsets, deep neural networks, rather than broader concepts such as artificial intelligence and machine learning.Abstracts, opinion pieces, presentations, and gray material were excluded.

Information Sources
The search identified a total of 1,229 citations.Initially, 375 duplicates were removed, leaving 854 studies to be screened.Team members reviewed article titles and abstracts, achieving consensus about which articles warranted further consideration.Discussions continued among all three reviewers until an agreement was reached.At this point, 834 articles were excluded for not meeting screening criteria: 470 because of being the wrong topic, 152 were the wrong publication type (e.g., abstract only and dissertations), 96 based on being too old, 74 due to the wrong population, 30 did not align with the scoping review's objective, and 12 due to having the wrong study design.Consequently, 20 articles were retained for critical analysis.The screening and selection process is depicted in Figure 1.

Search Strategy
A literature search to locate published studies was conducted in September 2023.A search was performed using Embase, Ovid MEDLINE, and Web of Science.Eligible articles included those published between 2010 and 2023 and in English, utilizing the search terms "leukemia" AND "deep learning" or "artificial neural network" OR "neural network" AND "diagnosis" OR "detection," either in the title, abstract, or keywords.A detailed search strategy table is summarized in Table 1.The reference list of all included sources of evidence was screened for additional studies.An information specialist assisted and confirmed the search strategy.

Selection of Sources of Evidence
All identified citations were collated and uploaded into a collaborative cloud-based software application tailored for conducting systemized reviews.Members of the research team discussed the results and inclusion criteria before the initial screening of the articles generated in the primary search.Two authors then worked independently to evaluate the abstracts and titles of the publications to determine their relevance to the review.Twenty articles appeared to be relevant for the final review.

Critical Appraisal of Individual Sources of Evidence
A comprehensive evaluation of the 20 articles was performed using the critical appraisal tools developed by the Joanna Briggs Institute (JBI), known for its reliability and ongoing improvement efforts.The appropriate checklist was used for each article to consider research biases, overall coherence, and critical components contributing to article quality.Two team members independently conducted a detailed and blinded appraisal of the 20 articles chosen for the final review using the applicable JBI tools.Articles were then categorized into the high, moderate, or low risk of bias based on their scores (below 50%, between 50% and 70%, and above 70%, respectively).Articles above 70% in the criteria were included while articles under 70% were considered at higher risk for bias and thus excluded.Subsequently, the team engaged in a deliberative process to compare their appraisal scores.The relevance and quality of each article were thoroughly discussed, leading to a final consensus on selecting articles for inclusion in the review whereby all 20 articles were included in the final review.

Data Charting and Extraction Process
Two reviewers collaborated to create a data-charting form using Excel (Microsoft Corporation, Redmond, WA) and determined the data to extract.Using an iterative process, the rest of the team independently charted and engaged in discussions about the results, and continually updated the data-charting form.The information extracted was based on the article's purpose, study population, sample, methods, limitations, and key findings (based on the percentage of success for the authors' DL model in diagnosing leukemia and additional pertinent information).

Results
Research on the use of DL technology for leukemia detection is relatively new.The 20 articles included in this review are reported chronologically to highlight the progression of the technology used for leukemia detection over the period determined for article inclusion (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019)(2020)(2021)(2022)(2023).This helps demonstrate the uniqueness of each approach to DL for leukemia detection with their percentage accuracy outcomes.

Earlier Works (2010-2018)
Two articles published in 2010-2018 fit the inclusion criteria and were thus included in the review.In 2010, staple of the more advanced convolutional neural networks (CNNs) that could be trained to do tasks.Pretrained CNNs AlexNet, Vgg-f, and CaffeNet from a dataset of 891 images achieved over 99% accuracy in detecting features and classification using support vector machine (SVM) classifiers [22].
Three additional articles published in 2021 reported the start of using image augmentation and segmentation, reporting high accuracy outcomes: 99.57% accuracy with Open Neural Network Exchange (ONNX) and a You Only Look Once version 2 (YOLOv2) CNN (single-stage real-time object detection model) for feature extraction alongside an SVM for classification [27]; 98.00% accuracy using k-nearest neighbor (KNN) for feature extraction while using SVM and random forest for classification [28]; and 98.61% accuracy using fine-tuned LeukNet (the name of a new computational tool) using transfer learning for classification achieving [29].
Another article reported using ensemble and multiclass classification for classification and AlexNet, Visual Geometry Group (VGG), Residual Network-50 (ResNet-50), GoogLeNet (a type of CNN based on the inception architecture), and Dense Convolutional Network121 (DenseNet121) for feature extraction, achieving 97.04% accuracy.When conducting augmentation and image segmentation alongside DenseNet121, 97.11% accuracy was achieved [32].Muhamad et al. obtained varying accuracy on feature extraction with softmax (amplifying effects of the exponential on any maxima in the input vector) for classification with 95.3%, 81.5%, and 97.6% on an unnamed CNN, AlexNet, and MobileNet-v2 (CNN architecture that seeks to perform well on mobile devices), respectively [33].
Other researchers applied a probabilistic neural network (PNN) for 95.705% accuracy without using data augmentation, image segmentation, or color normalization [34].However, Sakthiraj also did not use preprocessing tools for image datasets and achieved a 99.87% accuracy with a hierarchical convolutional neural network with integrated attention and spatial optimization (HCNN-IAS) [35].Of note, nearly 100% accuracy (average 99.7%) was attained with DarkNet-53 (backbone for the YOLOv3 object detection approach) and ShuffleNet (designed especially for mobile devices with very limited computing power) for feature extraction and SVM, ensemble methods, KNN, and naïve Bayes (an algorithm that uses Bayes' theorem to classify objects) for classification only segmenting their images [36].

Year 2023
The year 2023 demonstrated a focus on using only one model for feature extraction and classification; four articles published in this year were included in this review [37][38][39][40].Houssein and colleagues attained 99.80% accuracy with DenseNet-161 (a model from densely connected convolutional networks) using augmentation, segmentation, and RGB (R: red, G: green, B: blue) to HSV (H: hue, S: saturation, V: value) [37], while other researchers averaged 98.15% accuracy with a CNN model using only image segmentation [38].Naz and colleagues achieved 96.9% and 81.9% on separate datasets using AlexNet by augmenting their data and segmenting their images [39].Wang et al., after augmenting their data, reached 92.50% accuracy with You Only Look Once eXtreme small (YOLOX-s) for feature extraction and Meta-Learning Fusion and Learning Network (MLFL-Net) for classification [40].A summary of the articles included in this review is reported in (2020) [26] features from the blood image using image processing and to aid in the grouping of leukemia subtypes.
however, it did not specify where the images were sourced from.
fine tune the CNN for 100 runs and an initial accuracy metric will be plotted.Then another 100 training iterations will be done to achieve sufficient accuracy.

Discussion
This review explored the development of CNN, a type of deep learning architecture commonly used in computer vision, image analysis, and other spatial data processing tasks, in leukemia detection from 2010 to 2023.Selected papers are depicted chronologically and their various methods with accuracy outcomes and current limitations are reported.

Implication of Early DL Models for Leukemia Detection
Early works, like Adjouadi and colleagues in 2010, laid a foundation even if they lacked the resources of their successors, achieving a notable 96.67% accuracy in leukemia cell classification using Neural Studio (an early neural network model) [22].During this period, however, the methods heavily relied on flow cytometry for data (bone marrow blood samples) acquisition, a technique not seen in more recent studies due to the ability to train networks for feature extraction [21].This can be seen in 2018 when researchers utilized several pre-trained CNN models (AlexNet, Vgg-f, and CaffeNet) for feature extraction and an SVM cell classifier [29], achieving an accuracy score of 99%, which set the new standard [21,22].
These early findings demonstrated that the trained models were nearly perfect in feature extraction but had to rely on SVM to do cell classification [22].These works provide a historical context for the evolution of neural network-based methods in leukemia cell classification.It also highlights the early stages of using CNN and how these early methods were a foundation for later advancements.Moreover, by noting the accuracy rates of these early works and comparing them to subsequent ones, the accuracy of leukemia cell classification has improved over time (i.

Implication of More Recent DL Models
From 2020 to 2022, innovations like grayscale conversion, noise reduction, DSSCS CNN model, and color normalization yielded excellent results in the accuracy of CNN, with percentages ranging from 81.5% to 99.57% [23][24][25][26][27].The DSSCS CNN model was distinct from the others as it handled feature extraction and cell classification, a departure from previous approaches that used separate techniques like SVM, bagging, and multiclass ensembles [29].This model represents a more integrated approach, likely leading to more efficient processing and potentially better accuracy.The accuracy ranges from 81.5% to 99.57%, showing the significant strides made during this time.It demonstrates that these innovations contributed to more reliable and accurate classification.Furthermore, during this time, the introduction of image augmentation and segmentation, signaling a growing emphasis on improving their own CNN models through training with larger artificial datasets and advanced feature extraction methods, was explored.Image augmentation has proven especially helpful by allowing datasets to be augmented from only a few images, increasing the dataset pool without needing more bone marrow samples, which is useful where large patient datasets are difficult to find.This could contribute to more efficient and cost-effective research and development of neural network models [28][29][30][31][32][33][34][35][36].

Implication of the Paradigm Shift With Current DL Models
In 2023, an important shift occurred from using several CNNs for specific tasks to applying a single model for feature extraction and cell classification, with DenseNet-161 and AlexNet achieving 99.80% and 96.90%/81.90%,accuracy, respectively [37][38][39][40].However, the highly variable accuracy rates demonstrate the need for more research and replication of these studies.
This shift represents a transition toward simplicity and efficiency in model architecture, reducing the complexity of the pipeline.The use of DenseNet-161 shows that DL models with densely connected layers can be highly effective for feature extraction and classification.Its architecture enhances the model's ability to learn complex features without excessively increasing model size.This approach can be efficient, improve model performance, and make it more feasible to implement in clinical settings.

The Potential of DL Technology for Leukemia Detection
The strides made in DL for leukemia detection in the 13 years of published evidence covered in this review (2010-2023) demonstrate the potential of this technology to assist in the more efficient detection of leukemia.Integrating image augmentation, segmentation, and more advanced CNN architectures shows promise.The range of innovative methodologies suggests an ongoing need to refine and enhance performance.The various unique approaches suggest an ongoing effort to streamline processes and optimize performance.However, these models have not been validated in real-world clinical settings, where patient outcomes rely heavily on the diagnostic accuracy of the models.
Despite promising accuracy rates, challenges such as dataset variability, model interpretability, and generalization to diverse patient populations persist.This limits the validity of comparisons between studies since the different datasets and tools affect the accuracy scores.Excess segmentation and augmentation can lead to artifacts and omission of key data, thereby decreasing the validity of its reported accuracy [36].Also, images contain immense amounts of genetic variables that artificial enhancement cannot replicate through augmentation.While it is good for training the CNN models to detect several variables, the genetic variables that need to be examined vastly outnumber the number of valid samples currently found, something the researchers called the "curse of dimensionality" [34].Future research should prioritize addressing these challenges before it can be used in a real patient setting.Datasets need to be improved and conducting validation studies in clinical settings will prove their actual beneficial factor.Therefore, collaborative efforts among researchers on a global scale are essential to tackle these limitations.

Limitations
While the methods used in conducting the scoping review used rigorous and transparent methods throughout the process, some limitations exist.This review may not have been able to identify all the articles in the published literature despite attempts to be as thorough as possible.The search phrase used included several different words and phrases used in the literature to describe deep learning and leukemia detection, but other terms may also exist.Moreover, the search included three major medically focused databases, but searching other online databases may have produced additional articles.Also, we selected articles that were only in English, so including articles published in other languages might have yielded more studies.The findings from this review should be approached with a critical awareness of these limitations and recognize their potential impact on the comprehensiveness, generalizability, and relevance of the synthesized evidence.The limitations of each article in the final review are reported in Table 2.

Implications for future research
Based on the results of this review, future research in leukemia detection using deep learning models could focus on enhancing the accuracy, efficiency, and applicability of the models discussed.Exploring new models to augment existing datasets should consider differences in patient demographics, disease subtypes, and data acquisition techniques (e.g., gene expression profiles and diverse imaging modalities).Future research could focus on developing ways to improve the robustness of models, addressing variations in data quality, acquisition protocols, and patient populations to mitigate risks of overfitting.This would serve to improve generalization performance.A salient gap in DL research involving leukemia detection appears to be in the clinical validation of DL models for leukemia detection.While DL has shown promising results in research settings, their performance in real-world clinical environments may vary.

Conclusions
This review presents a comprehensive overview of the evolution of CNNs in leukemia detection from 2010 to 2023, highlighting significant advancements and emerging trends in the field.The initial studies laid the groundwork for subsequent innovations, illustrating the transition from specialized methods to more generalized approaches capitalizing on DL technologies for leukemia detection.This summary of recent DL models revealed a paradigm shift toward integrated architectures, resulting in notable enhancements in accuracy and efficiency.Regardless, impediments (e.g., real-world clinical setting validation, variability in datasets, and model interpretability) remain substantial barriers to widespread adoption.While DL technology holds promise for transforming leukemia detection, existing limitations must be overcome via rigorous research and validation practices.Future initiatives should enhance model accuracy, efficiency, and clinical applicability to fully leverage DL technology in improving leukemia diagnosis and patient outcomes.

FIGURE 1 :
FIGURE 1: PRISMA flow diagram PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Table 1
reports the search strategy.

Table 2 .
employed to reduce the dataset's complexity.By computing mean, peak, standard deviation, skewness, and kurtosis from parameter histograms, aThe method proposed in this work aims to diagnose leukemia using blood smear images with a CNN using the following flow chart: feature extraction (CNN), feature selection (gain ratio), classifier (support vector machine), and image classification (pathological or non-pathological.layers:four layers for extracting features from input images and one output layer for classification.The dimensions of the input image were 50x50x1.The convolutional filter size is 3x3, and the max-pooling filter size is smears 3x3 with a stride of 1.The study emphasized the operations within the convolutional layers, including convolutional operation, activation (ReLu), and max pooling, to extract features from the input images.wereconducted using Google Colaboratory as the notebook, Anaconda as the Python distributor, the Keras library, and TensorFlow as the backend engine.After (GoogleNet, ResNet, and DenseNet) to construct classification models and carry out a comparative analysis.Simplified image preprocessing combined with transfer learning was used to improve the classification accuracy of the model and achieve the classification of myelograms from AML, ALL, CML, and healthy subjects.2024 Rubinos Rodriguez et al.Cureus 16(5): e61379.DOI 10.7759/cureus.613797 of 20 70% of images from the dataset were used for training and the other 30% for testing.For the validation step, this split was tweaked to 80/20 respectively.To test the robustness of the classification method, the model used to train on one dataset was tested with the images from another dataset.of the findings.of 65.2%, similar to 67.6% for the CNN.A similar story can be told for the models trained on tight datasets.CNNs trained on the tight data Then a test data set will be classified into four categories: ALL, CML, ALL/AML, and chronic lymphocytic leukemia (CLL).2024 Rubinos Rodriguez et al.Cureus 16(5): e61379.DOI 10.7759/cureus.61379emphasized in this study.For the training step, 2024 Rubinos Rodriguez et al.Cureus 16(5): e61379.DOI 10.7759/cureus.61379images.2024 Rubinos Rodriguez et al.Cureus 16(5): e61379.DOI 10.7759/cureus.61379

Table 3
reports the deep learning techniques and tools and accuracy attained in detecting leukemia used in each study included in this review.