Cell Nuclear Segmentation of B-ALL Images Based on MSFF-SegNeXt

Xinzheng Wang; Cuisi Ou; Zhigang Hu; Aoru Ge; Yipei Wang; Kaiwen Cao

doi:10.2147/JMDH.S492655

Back to Journals » Journal of Multidisciplinary Healthcare » Volume 17

Original Research

Cell Nuclear Segmentation of B-ALL Images Based on MSFF-SegNeXt

Authors Wang X, Ou C, Hu Z , Ge A, Wang Y, Cao K

Received 23 August 2024

Accepted for publication 23 November 2024

Published 2 December 2024 Volume 2024:17 Pages 5675—5693

DOI https://doi.org/10.2147/JMDH.S492655

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr David C. Mohr

Download Article [PDF]

Xinzheng Wang, Cuisi Ou, Zhigang Hu, Aoru Ge, Yipei Wang, Kaiwen Cao

School of Medical Technology and Engineering, Henan University of Science and Technology, Luoyang City, Henan Province, People’s Republic of China

Correspondence: Zhigang Hu, School of Medical Technology and Engineering, Henan University of Science and Technology, Luoyang City, Henan Province, People’s Republic of China, Email [email protected]

Purpose: The diagnosis and treatment of B-Lineage Acute Lymphoblastic Leukemia (B-ALL) typically rely on cytomorphologic analysis of bone marrow smears. However, traditional morphological analysis methods require manual operation, leading to challenges such as high subjectivity and low efficiency. Accurate segmentation of individual cell nuclei is crucial for obtaining detailed morphological characterization data, thereby improving the objectivity and consistency of diagnoses.
Patients and Methods: To enhance the accuracy of nucleus segmentation of lymphoblastoid cells in B-ALL bone marrow smear images, the Multi-scale Feature Fusion-SegNeXt (MSFF-SegNeXt) model is hereby proposed, building upon the SegNeXt framework. This model introduces a novel multi-scale feature fusion technique that effectively integrates edge feature maps with feature representations across different scales. Integrating the Edge-Guided Attention (EGA) module in the decoder further enhances the segmentation process by focusing on intricate edge details. Additionally, Hamburger structures are strategically incorporated at various stages of the network to enhance feature expression.
Results: These combined innovations enable MSFF-SegNeXt to achieve superior segmentation performance on the SN-AM dataset, as evidenced by an accuracy of 0.9659 and a Dice coefficient of 0.9422.
Conclusion: The results show that MSFF-SegNeXt outperforms existing models in managing the complexities of cell nucleus segmentation, particularly in capturing detailed edge structures. This advancement offers a robust and reliable solution for subsequent morphological analysis of B-ALL single cells.

Keywords: single nucleus segmentation, B-ALL image, bone marrow smear, Edge-Guided Attention, multi-scale feature fusion

Introduction

Acute Lymphocytic Leukemia (ALL) is the most common type of malignancy in children, accounting for approximately 70% of Acute Leukemia (AL) cases in children.¹ B-lineage Acute Lymphoblastic Leukemia (B-ALL) is the most common subtype of ALL and can occur at any age, but is more common in children. Early diagnosis and timely treatment are essential for B-ALL, given the high cure rate after treatment in pediatric patients. However, early diagnosis of B-ALL relies on a range of complex laboratory tests. Among these, morphological examination of bone marrow cells has emerged as a key diagnostic method due to its simplicity, rapid reporting capability, and high diagnostic value.² Through smear staining and microscopic evaluation, researchers identify alterations in cell morphology and quantity, aiding in disease diagnosis, prognosis, and treatment monitoring. The morphology, size, and shape of cells in bone marrow smears are notably more complex than those in peripheral blood smears, offering a wealth of diagnostic information. However, the morphological evaluation of bone marrow cells is a highly specialized procedure, affected by numerous factors, demanding high-quality standards and expert skills from testing personnel. Consequently, morphological analysis is limited by increased subjectivity and diminished efficiency. To improve diagnostic accuracy and efficiency in bone marrow cell morphology examination, single-cell nucleus segmentation technology utilizing deep learning has recently gained traction in the field of medical image analysis. Precise isolation of each cell nucleus from intricate backgrounds facilitates the acquisition of detailed and accurate cellular morphological data, thereby enhancing the objectivity and consistency of morphological evaluations. Furthermore, this technology significantly reduces manual processing time and costs, enabling large-scale morphological analyses and providing more reliable data support for early disease detection, treatment strategy formulation, and prognostic evaluations.

Bone marrow smear image analysis is a vital non-invasive diagnostic tool for early screening and treatment evaluation of B-ALL. Gupta et al established a comprehensive dataset known as C-NMC,³ which is the largest publicly accessible dataset for B-ALL image classification. They also introduced the SN-AM⁴ dataset, which includes microscopic images of bone marrow aspiration slides from patients diagnosed with B-ALL and Multiple Myeloma (MM). The public availability of these datasets has significantly propelled research on bone marrow smear image analysis and application, greatly aiding the development of computer-aided diagnostic tools for B-ALL.

In this context, the current study explores the application of B-ALL based mononuclear cell segmentation in computer-aided diagnosis and proposes a segmentation method based on the Multi-scale Feature Fusion-SegNeXt (MSFF-SegNeXt). This method is used to achieve segmentation at the nucleus level, enabling subsequent classification and facilitating more detailed and precise analyses of B-ALL in bone marrow smear images.

The principal contributions of this research include the proposal of a fusion method that integrates the Sobel edge feature map with multi-scale features, which enhances the expression and utilization of nuclear edge information in cells. Additionally, the Edge-Guided Attention module is combined with the SegNeXt model to strengthen the model’s ability to extract cell nucleus edge features. Furthermore, the Decoder structure of the SegNeXt model is optimally redesigned, and the Hamburger module is added at different stages to construct the MSFF-SegNeXt model, which significantly improves segmentation performance and accuracy. These improvements enhance nucleus segmentation while offering practical tools for analyzing single cells, with significant potential for clinical diagnostics in medical images such as B-ALL bone marrow smears.

This paper begins with a review of the current state of research and the limitations of existing methods. The proposed MSFF-SegNeXt model and its key techniques are then presented in the Method section. The experimental setup and results analysis are discussed in the Results section. The significance of the results and the direction of future research are explored in the Discussion section. Finally, the work of this paper is summarized in the Conclusion section.

Related Work

Deep learning is attracting widespread attention for its rapid development in several fields, especially in disease prediction and diagnosis. Specifically, it has great potential in the detection and diagnosis of diseases and the analysis of blood images. For example, when it comes to epilepsy detection, Khan et al⁵ proposed the D2PAM model, which employs a deep dual-patch attention mechanism to effectively predict epileptic seizures using brain signal data, significantly enhancing the model’s accuracy and reliability. In the case of Alzheimer’s diagnosis, they also proposed a model named Dual-3DM3-AD,⁶ which uses a hybrid Transformer method for fusion and processing of multimodal data (MRI and PET images). This model aims at early multiclassification diagnosis of Alzheimer’s disease, through the use of Quaternion Means Denoising Algorithm (QNLM) noise reduction, morphological function de-cranialization, and Block Divider Model (BDM) 3D image transformation. Additionally, pre-processing techniques, such as a multi-scale feature extraction module, enhance the accuracy and efficiency of the diagnosis.

Cell and cell nucleus segmentation has emerged as a significant research focus in blood image analysis, and several methods have been used to achieve significant results in this field. Hollandi et al⁷ introduced an enhanced Mask R-CNN to predict nuclei, incorporating U-Net and a morphological approach to achieve precise nucleus segmentation, thus automating the process. Fei⁸ presented a refined, lightweight U-Net variant (U-Net+) that facilitates accurate segmentation of cell nuclei while minimizing computational demands. In the context of leukocyte and nucleus segmentation, Ghane et al⁹ developed an innovative approach combining thresholding, k-means clustering, and adapted watershed algorithms, which automatically segment leukocytes and their nuclei in microscopic images. Additionally, Zheng et al¹⁰ proposed a self-supervised learning technique that begins with unsupervised segmentation, followed by supervised refinement, utilizing median color features and a weak edge enhancement operator to improve boundary delineation. Lastly, Lu et al¹¹ introduced WBC-Net, a deep learning architecture based on U-Net++ and ResNet, featuring a context-aware encoder with residual blocks for extracting leukocyte features at multiple scales and effectively merging them across different resolutions.

As research progresses, an increasing amount of studies are focusing on cell or nucleus segmentation in ALL, especially B-ALL, a more complex field that demands higher accuracy. Among traditional machine learning methods, Li et al¹² proposed a dual-threshold method of leukocyte segmentation from acute lymphoblastic leukemia images, which effectively combines a single-threshold method based on RGB (Red, Green, and Blue) and HSV (Hue, Saturation, Value) color spaces to achieve segmentation of ALL cells by different thresholds. Praveena et al¹³ used a deep convolutional network in conjunction with an optimization algorithm for ALL detection, followed by segmentation using Sparse Fuzzy C-Means (Sparse-FCM) clustering algorithm. Ahmed et al¹⁴ converted ALL images from RGB to CMYK (Cyan, Magenta, Yellow, blacK) color space, preprocessed it by histogram equalization and thresholding by Zack’s technique, and used the KNN (K-Nearest Neighbors) algorithm to segment the cell nuclei. Krishna et al¹⁵ used an entropy-based hybrid model combining active contour and FCM and segmented ALL cell nuclei based on entropy parameters and used the obtained nuclei image as feature input into the detection network. Vasundhara et al¹⁶ converted images from RGB to LAB color space, and the K-medoids algorithm was used in order to extract the cellular features of ALL for segmentation of its cell nuclei. In recent years, deep learning has been increasingly used in the analysis of microscopic images of ALL diseases due to the limitations of traditional machine learning methods, which usually require significant expertise in selecting the appropriate algorithm for the problem to tune its parameters. Furthermore, there is no single parameter selection to accommodate the heterogeneity of biological samples.¹⁷ Ghosh et al¹⁸ used a deep convolutional neural network for extracting the image features, which utilizes the average pooling layer to calculate the hotspots of cell locations in the entire section image, the results were used to realize the localization and segmentation of ALL cells. Saif et al¹⁹ designed a segmentation model of ALL cells to adapt the model to be pervasive under different lighting conditions of the image, which utilizes 4-moment statistical features in combination with Artificial Neural Networks (ANNs) to extract the features of ALL cells, aiming for automatic segmentation of the cells. Duggal et al²⁰ proposed a Deep Belief Network (DBN) based method for segmenting overlapping B-ALL cell nuclei images. The method extracts high-level features of the DBN pre-processed image, and the output of each layer is used as the input of the next layer, to extract more abstract and representative morphological features of the cell nuclei layer by layer. DBN can accurately segment each individual cell nucleus by the learned features, which effectively solves the problem that overlapping cell nuclei are difficult to segment in microscopic images.

Traditional machine learning techniques have been proven effective in cell and nucleus segmentation. However, these methods often depend on manual feature extraction and precise parameter tuning, which can complicate the handling of morphological variations of cell nuclei. On the other hand, although deep learning has shown promise in various fields, the computational efficiency of many models as well as the generalization of the models still require improvement. Moreover, in existing studies based on nuclear segmentation of ALL lymphoblastoid cells, most have been analyzed based on the peripheral blood smear ALL-IDB dataset,²¹ which is poorly adapted to staining changes when dealing with individual nucleus segmentation. While some techniques show potential, single-cell nucleus segmentation based on B-ALL bone marrow smear images requires further research and refinement to address the unique obstacles that arise in the field.

Method

Overview

The SegNeXt²² model performs well in semantic segmentation tasks, particularly due to its redesigned attention mechanism, which achieves a balance between global contextual information and local detail preservation. This capability makes it a strong foundation for this study. Moreover, compared with many traditional deep learning models, the SegNeXt model demonstrates significantly lower computational complexity, with its parameter count at only 39.5%, 34.9%, and 32.6% of U-Net, R2U-Net, and DeepLabV3, respectively. Its computational cost (Total Mult-Adds) is reduced to just 5.6%, 2.4%, and 8.7% of these models, making it highly suitable for resource-limited environments. The SegNeXt model has a lower total number of parameters and fewer computations than models such as U-Net, R2U-Net, and DeepLabV3, suggesting a more efficient architecture (Table 1). Efficient segmentation of the nucleus from B-ALL bone marrow smears is crucial for subsequent analyses. SegNeXt excels in capturing the fine edge structure of the nucleus and handles medical data efficiently while maintaining accuracy, making it a suitable base network for this study.

Table 1 Computational Complexity of Different Models

The MSFF-SegNeXt Model

To enhance segmentation accuracy and stability, the SegNeXt model is improved, resulting in the proposed MSFF-SegNeXt model. The model’s architecture encompasses two primary components: the encoder and the decoder (Figure 1). The encoder comprises four stages, each containing a different number of downsampling layers and multi-scale convolutional attention network (MSCAN) Blocks. Meanwhile, the decoder integrates the Hamburger module,²³ the Edge-Guided Attention (EGA)²⁴ module, and a Multilayer Perceptron (MLP). Specifically, the features extracted from the first Hamburger module are fed into the EGA module after passing through the Fea block (Feature block), and subsequently, the features from the EGA module are fed back through the Fea block to the second Hamburger module. This interaction facilitates the fusion of the EGA module and the multiplexing capabilities of the Hamburger module, thereby enhancing the representation of the lymphoblastoid structure within the B-ALL image.

Figure 1 The MSFF-SegNeXt model.

Encoder

Due to the high nucleoplasmic ratio of lymphoblastoid cells in B-ALL, the nucleus closely resembles the overall cell boundary. This proximity complicates the segmentation of cells from the surrounding cytoplasm. Consequently, this study selects the encoder of SegNeXt for feature extraction from B-ALL images. The encoder comprises four stages (Stage 1 to Stage 4), with their specific structures illustrated in Figure 1. Each stage captures multiscale information by progressively deepening the feature map. The preliminary feature map of the B-ALL image is generated in the first stage. Each stage incorporates a MSCAN block, consisting of a feed-forward network, batch normalization, and an attention mechanism. The feed-forward network performs nonlinear transformations of input features, thereby enhancing the representation of cell nucleus characteristics. Meanwhile, batch normalization stabilizes the training process and accelerates model convergence. The multi-scale convolutional attention (MSCA) module is mainly divided into two main modules (Figure 2). The Multi-scale Feature module employs multi-branch depth-wise strip convolutions to extract features from cell nuclei in B-ALL images. These convolutions employ varying kernel sizes to capture both local details and broader contextual information. In the Channel Mixing module, addition operations are used to combine the outputs from different scales, resulting in a comprehensive feature representation. Following the addition, a 1×1 convolution is applied to refine the combined features and reduce dimensionality. This process effectively fuses the multiscale information, maintaining a balance between feature richness and computational efficiency. The MSCA module can be written as:

(1)

(2)

Figure 2 Illustration of the MSCA module.

Notes: Multi-scale Feature, the module that uses multiple deep convolutions with different kernel sizes to fuse features; Channel Mixing, the module that refines the features further.

where F denotes the input feature, Att and Out are the attention mapping and output, respectively, is the elemental matrix multiplication operation, DW-Conv denotes deep convolution, Scale_i,i ∈{0, 1, 2, 3} denotes the i-th branch in Figure 2, and Scale₀ is the identity connection. The MSCA module’s design ensures that important features at different scales are captured, improving segmentation accuracy while reducing computational cost.

Decoder with Edge-Guided Attention Module

To effectively capture the sophisticated semantic information of cell nuclei in B-ALL, the decoder methodically reconstructs and retrieves this nuanced information in a stepwise manner. The decoder’s architecture is optimized to balance computational efficiency with performance. Since the initial feature maps contain primarily low-level data, the decoder integrates only the feature maps from the final three stages, minimizing computational cost. This strategic choice promotes efficient utilization of resources while sustaining model efficacy.

Considering that the MSCA module is good at detecting banded features, but the B-ALL nucleus mainly exhibits a round shape, this study integrates an improved EGA module in the decoding stage to enhance the edge extraction capability. To further enrich feature representation, the Hamburger module is employed twice within the decoder. These components collectively form the final feature map for segmentation, with results refined by MLP.

The proposed MSFF-SegNeXt model incorporates these improvements into a unified framework, emphasizing modularity and adaptability. This approach allows for the seamless integration of edge features with multi-scale representations. The implementation details are as follows:

Edge-Guided Attention Module

The EGA module addresses edge divergence issues caused by the MSCA module by refining edge representation while maintaining a lightweight architecture. Its design consists of two functional stages: Multi-scale Feature Fusion and Feature Secondary Processing (Figure 3).

Figure 3 The architecture of the EGA block.

Notes: Multi-scale Feature Fusion, the module that fuses multiple features; Feature Secondary Processing, the module that further processes and refines fusion features.

In the Multi-scale Feature Fusion stage, the EGA module fuses three critical feature sets to enhance edge sensitivity and representation:

Encoder’s Higher-order Features: These features are extracted from the fourth stage of the encoder (output of stage 4 in Figure 1), capturing high-level, abstract information from the input image. This provides a rich contextual foundation that enhances the model’s understanding of complex structures in the image.
Edge Features: By applying Sobel’s algorithm to the input image to generate edge features, the edge feature maps extracted from the B-ALL image suppress white noise more effectively. This highlights important edges at native resolution and provides critical detail for segmentation tasks.
Decoder Features: The decoder features are generated by the Hamburger module in the first layer of the decoder (output of the first Hamburger module through the Fea block in Figure 1) and produce two key sub-features. The first sub-feature is boundary attention, which uses Laplace pyramiding to enhance the prominence of cell boundary pixels, thereby improving edge feature representation. The second sub-feature is reverse attention, which is calculated based on the background attention of the decoder features. This reverse attention effectively suppresses background noise that could obscure cell nucleus edges. This balancing of boundary and non-boundary information enables clearer and more accurate segmentation of cellular structures.

In the Feature Secondary Processing stage, the aggregated features are refined using the convolutional and attention mechanisms of the Convolutional Block Attention (CBAM) Module. This attention mechanism selectively enhances critical features, further boosting the segmentation accuracy and boundary extraction of cell nuclei in B-ALL images.

By integrating these components, the EGA module ensures precise edge detection, effective noise suppression, and robust feature representation, making it a key contributor to the decoder’s performance in segmenting round lymphoblastoid nuclei.

Hamburger module position design

In this study, the Hamburger module is integrated at key stages of the decoder to enhance feature processing and multi-scale feature fusion. The Hamburger structure combines matrix decomposition techniques with convolutional operations, which allows for efficient processing of global information while maintaining low computational complexity (Figure 4). The output of H(Z) is:

(3)

Figure 4 The architecture of the Hamburger module.

where M is the matrix decomposition to recover the clear latent structure. The final output of the Hamburger module is obtained by batch normalization (BN) and a skip connection:

(4)

Specifically, the first Hamburger module is placed immediately after the encoder’s output, forming the first layer of the decoder. This placement helps capture global contextual information from low-level features and facilitates their integration with high-level abstractions. The second Hamburger module is positioned after the output of the EGA module. This placement further enhances the features refined by the EGA module and ensures a balance between preserving local details and global context before generating the segmentation output. By employing these two layers of Hamburger modules, the model enables iterative feature processing, which is crucial for accurately segmenting cell nuclei in B-ALL images.

The integration of the EGA module and the strategic placement of the Hamburger module greatly improves the computational efficiency of the model. Additionally, this configuration substantially boosts the model’s sensitivity and edge extraction capabilities for cell nuclei in B-ALL images. The subsequent processing of the fused features markedly enhances the model’s sensitivity to edge details and improves segmentation accuracy. Consequently, the model can segment nuclei in B-ALL single-cell images with greater precision.

Dataset Construction

This study used the SN-AM dataset, which consists of bone marrow smears from B-ALL and MM. The microscopic images were captured from bone marrow aspiration slides of patients diagnosed with B-ALL, which had been stained using Jenner-Giemsa stain. Due to time differences in the sample preparation and image processing and operational differences, the cell colors in the microscopic images showed some natural variability (Figure 5). The stained images were not normalized to accommodate the diversity of staining variations in bone marrow smears in practical applications, thus enhancing the generalization ability of the model.

Figure 5 B-ALL bone marrow images presented in different colors. (A) The original image, (B and C) Images with color variations.

Given that B-ALL is characterized by the rapid production of immature leukocytes known as lymphoblastoid cells, the present study was based on lymphoblastoid cells in B-ALL bone marrow smear images. Figure 6 shows the annotation of the B-ALL lymphoblastoid cells in the dataset, where the image in Figure 6A contains the cellular annotations provided by the dataset, ensuring that each lymphoblastoid cell is identified, and Figure 6B shows the original image after cropping it down to individual cells. Then the image cropping strategy centered on the nucleus is implemented for cells, and the cropping size is uniformly set to 350×350 pixels. The upper left corner of Figure 6B shows a black background fill strategy that is used when the nuclei are close to the edge of the image to ensure consistency and integrity of the image crop. Since this data has an average of four to five lymphoblasts per B-ALL image, the process of cropping the images converted the original 30 entire images into 127 single-cell images. In order to focus on the nucleus level analysis, two domain experts performed a rigorous manual nucleus outlining of the transformed single-cell images to generate the required mask images for segmentation (Figure 7). The dataset is then divided into the training, validation, and test sets according to the random sampling principle in the ratio of 6:2:2 to ensure the fairness and comprehensiveness of model training and performance evaluation. In addition, to enhance the reliability of the experimental results, three-fold cross-validation was used for the experiments, and the average performance was used as the final evaluation index, aiming to effectively reduce the influence of sampling errors on the experimental results.

Figure 6 Original image of B-ALL (A) and its single-cell image (B).

Figure 7 Mask Making Process. (A) the original image, (B) the image of the nucleus delineated on the original image, and (C) the binary mask image of cell nuclei.

Evaluation Metrics

The key evaluation metrics for assessing the performance of the MSFF-SegNeXt model in cell nuclei segmentation encompass the Dice coefficient, intersection over union (IoU), accuracy, precision, and recall. These metrics serve to evaluate the degree of overlap, correctness, and completeness achieved by the segmentation process. Specifically, Dice and IoU measure similarity between predicted and ground truth masks, accuracy reflects the overall correctness of predictions, precision assesses positive prediction accuracy, while recall evaluates true identification. The mathematical formulas representing these metrics are presented below:

(5)

(6)

(7)

(8)

(9)

where: TP (True Positives) are the correctly predicted positive instances, TN (True Negatives) are the correctly predicted negative instances, FP (False Positives) are the incorrectly predicted positive instances, and FN (False Negatives) are the incorrectly predicted negative instances.

Results

Experimental Configuration

The experimental configurations of the individual models in this paper are shown in Table 2. All experiments in this paper are performed on a GPU computing platform for Windows with 32GB×1 RAM, NVIDIA GTX3060Ti GPU, Intel^®c612 chipset, and all programs are implemented by the open-source framework Pytorch version 2.2.0, torchvision version 0.17.0, CUDA version 12.1.0, and python version 3.9.18.

Table 2 Hyperparameter Setting of Different Model

Experiment Analysis of Results

Segmentation results and Comparison of the MSFF-SegNeXt Model

The segmentation effectiveness of the MSFF-SegNeXt model in the lymphocyte nucleus segmentation task in B-ALL images is evaluated by comparison with several existing segmentation models, including the original SegNeXt, U-Net,²⁵ R2U-Net,²⁶ and DeepLabV3.²⁷ All experiments in this study are conducted using three-fold cross-validation, with the reported metrics representing the average results across the three folds. In terms of IoU metrics, the MSFF-SegNeXt achieved the highest value of 0.8722, which is significantly higher than the base SegNeXt model (0.8495) as well as the other comparison models (0.7094 for U-Net, 0.7626 for R2U-Net, and 0.8183 for DeepLabV3) (Figure 8). This indicates that MSFF-SegNeXt excels in segmentation accuracy and detail retention, and has higher accuracy in capturing details of cell nucleus boundaries. In terms of the Dice coefficient, the proposed MSFF-SegNeXt also performs well, with a mean value of 0.9422, again outperforming all other models (Figure 8). The improvement in the Dice coefficient further proves the superior performance of MSFF-SegNeXt in segmentation tasks, especially when dealing with complex-shaped nuclei, which can be distinguished and segmented more effectively.

Figure 8 IoU and Dice values for each model.

In addition, the MSFF-SegNeXt mode excels in accuracy and recall, reaching 0.9659 and 0.9566, respectively, compared to 0.9501 and 0.9475 for the original SegNeXt model, and even lower performances for the other models (Table 3). Although the MSFF-SegNeXt model has a slightly lower average precision rate than the DeeLabV3 model, it performed the best overall. These results indicate that the MSFF-SegNeXt model can significantly improve the segmentation results in the task of cell nucleus segmentation in B-ALL bone marrow smear micrograph images, outperforming the traditional U-Net, R2U-Net, and DeepLabV3 models. This indicates that the MSFF-SegNeXt model is more comprehensive and effective in capturing cell nuclei in B-ALL images.

Table 3 Average Metrics of the Different Models on the SN-AM Dataset

Moreover, the performance metrics of the SegNeXt model (Group 1) as well as the MSFF-SegNeXt model (Group 2) proposed in this paper are analyzed by t-test (Table 4). The results show that, on the basis of ensuring that the recall value of the MSFF-SegNeXt model is higher than that of the SegNeXt model, the p-values of the metrics of Dice, IoU, and accuracy are all less than 0.05, which indicates that the difference is statistically significant. Especially the precision indicator (p<0.01) has a significant difference, by which the improvement of the model can be proved to be meaningful.

Table 4 Comparison of SegNeXt (Group 1) and MSFF-SegNeXt (Group 2) Metrics

A detailed comparative analysis of the training loss and mean Intersection over Union (mIoU) of the models is also provided. As shown by the loss curves in Figure 9, the loss values of all models decreased rapidly at the beginning of training. Among them, MSFF-SegNeXt and SegNeXt have faster-decreasing loss values and maintain relatively low loss values throughout the training process, indicating their high performance during the optimization process. In contrast, the loss values of U-Net and R2U-Net fluctuate more throughout the training process, and there is greater instability in the training process. Although the DeepLabV3 model tends to stabilize and maintains a low level of loss, it does not converge well. The proposed MSFF-SegNeXt outperforms both the SegNeXt and DeepLabV3 models and maintains the highest mIoU value in most of the training cycles, suggesting that it has higher accuracy in the segmentation task (Figure 9). Overall, MSFF-SegNeXt demonstrated the best performance in the task of segmenting cell nuclei from B-ALL bone marrow smears.

Figure 9 Training process curves of different models. (A) training loss, (B) IoU.

As can be seen from the various indicators, the original SegNeXt model performs well in various segmentation tasks. However, there are still problems of boundary dispersion and boundary-blurring in its segmentation results, which affect the segmentation accuracy (Figure 10). In this study, these problems are remediated by the introduction of the EGA module, which not only effectively enhances the clarity of the segmentation boundaries, but also improves the accuracy of the model in dealing with the details of the cell nucleus boundaries, thus bringing the segmentation results closer to the ground truth. Moreover, compared with other models, the model proposed in this study is more accurate for the segmentation of individual cell nuclei.

Figure 10 Comparison of segmentation performance of several models on the SN-AM dataset. (A) Ground Truth, (B) U-Net, (C)R2U-Net, (D) DeepLabV3, (E) SegNeXt, (F) MSFF-SegNeXt.

Ablation Experiments

In addition, conducted ablation experiments are conducted on the SN-AM dataset. In terms of visualization of edge feature maps, two edge detection algorithms using the same contrast enhancement are visualized for ease of comparison and computation. Figure 11 demonstrates the comparison of different edge detection algorithms on B-ALL images, clearly showing that the edge feature maps generated by the Sobel operator have less white noise and are more capable of extracting the edges of cell images.

Figure 11 Different edge detection images. (A) manually delineated edge image, (B) Laplacian edge detection image, (C) Sobel operator edge detection image.

In order to provide quantitative support for this idea, the structural similarity (SSIM), recall, and precision indicators between the images generated by each edge detection method and the nuclear boundary image of the cell are calculated (Table 5). The numerical results of each index indicate that the edge detection image of the Sobel operator has higher values of SSIM and accuracy, which suggests that it is more accurate in edge representation and has a higher degree of structural similarity to the standard edge of the B-ALL image. Furthermore, edge detection images produced by the Sobel operator exhibit a higher recall rate, indicating more comprehensive edge detection, and a higher precision. However, due to the limited number of pixels along the nucleus edges, even minor variations can lead to significant accuracy decreases, resulting in relatively low overall accuracy for images processed by both the Laplacian and Sobel operators. Nevertheless, these quantitative indicators suggest that the Sobel operator provides superior edge detection compared to the Laplacian operator, consistent with the qualitative evaluation results in this study, confirming the Sobel operator’s capability to generate high-quality edge feature maps.

Table 5 Comparison of different indexes under different edge detection algorithms

The MSFF-SegNeXt with Sobel edge feature map as input yields better segmentation results, and its segmented cell nucleus boundaries are closer to the ground truth (Figure 12). Experiments show that the fusion of the Sobel edge feature map with other features improves the model’s ability to extract the edge of the cell nucleus, which increases the robustness and accuracy of the improved EGA module when dealing with complex edge structures.

Figure 12 Comparison of segmentation performance of several models on the SN-AM dataset. (A) Ground Truth, (B) Improved SegNeXt model using EGA module with Laplacian operator, (C) MSFF-SegNeXt.

Table 6 presents the results of the complete ablation experiments, aiming to evaluate the contributions of different modules in the model. The effects of integrating Laplacian and Sobel edge feature maps in the EGA module are compared, as well as the impact of single versus multiple utilizations of the Hamburger module. The proposed MSFF-SegNeXt model, which incorporates Sobel edge features in the added EGA module and reuses the Hamburger module, achieved the highest performance across all metrics. Specifically, this configuration resulted in a mean Dice score of 0.9422, mean IoU of 0.8926, mean accuracy of 0.9659, mean precision of 0.9316, and mean recall of 0.9566. These findings highlight the effectiveness of the EGA module integration and reuse of the Hamburger structure in enhancing model performance, particularly when combined with Sobel feature maps. In contrast, omitting these components leads to lower evaluation metrics, underscoring the importance of these modules in achieving optimal segmentation results.

Table 6 Average Metrics Across Different SegNeXt Models

Taken together, the MSFF-SegNeXt model outperforms the traditional U-Net, R2U-Net, and DeepLabV3 models in the task of cell nuclei segmentation from B-ALL bone marrow smear. The improvement in the various metrics indicates that the MSFF-SegNeXt model demonstrates stronger edge extraction capability and better performance, which can be better adapted to the needs of this specific task and has high potential and value in practical applications.

Discussion

A cell nucleus segmentation method based on the MSFF-SegNeXt model is proposed in this study, specifically designed for cell nucleus segmentation in B-ALL bone marrow smears. The model’s multi-scale feature fusion and edge-directed attention mechanism can effectively capture the complex structure of the nucleus, achieving Dice coefficients as high as 0.9422 and accuracy values as high as 0.9659, which demonstrate the significant advantages of the proposed method in the nucleus segmentation task. Compared with the conventional U-Net and R2U-Net models, MSFF-SegNeXt is more accurate in capturing cell boundaries, especially in the case of complex cell morphology.

The MSFF-SegNeXt model not only performs well on bone marrow smear images but also on peripheral blood smear images, showing its strong generalization ability (Table 7). The Rabbin^28,29 and WBC¹⁰ datasets used for validation in this paper are both leukocyte segmentation datasets based on peripheral blood smear images. Among them, the WBC dataset comprises two image datasets disclosed and organized by Xin et al;¹⁰ one contains 300 images of individual leukocytes and the other consists of 100 images of individual leukocytes. Given the small size of both datasets as well as the low percentage of lymphocytes, the two datasets are merged for validation in this paper. On the Rabbin dataset, MSFF-SegNeXt achieves the highest Dice score of 0.9611 and IoU of 0.9272, outperforming U-Net, R2U-Net, and DeepLabV3. In addition, on the WBC dataset, the MSFF-SegNeXt model achieves the highest Dice score of 0.9224 and IoU of 0.8666, again outperforming the other compared models. Although the proposed MSFF-SegNeXt model is slightly less accurate on the WBC dataset compared to DeepLabV3, its overall performance remains the best.

Table 7 Average Metrics of the Different Models on the Rabbin and WBC Dataset

These results demonstrate the model’s effectiveness across multiple datasets, confirming its reliability and robustness in the cell nucleus segmentation task. Notably, the proposed method outperforms existing approaches in the segmentation of nuclei in B-ALL bone marrow smear micrographs, achieving superior performance across several quantitative metrics. It effectively addresses key limitations of traditional methods, such as high computational complexity and limited adaptability, while significantly surpassing existing methods in bone marrow smear cell segmentation. Additionally, the model shows strong adaptability to staining variations, with remarkable generalization capabilities, making it well-suited for a wide range of real-world applications. This outstanding performance is attributed to the integration of EGA modules and the strategic placement of Hamburger modules within the network.

However, there are some limitations of this study. First, the model is trained on a specific dataset (SN-AM dataset), which may not adequately cover the nucleus morphology of all lymphoblastoid cells in B-ALL, thus affecting the generalization ability of the model. Second, current deep learning models still face certain challenges in complex image denoising tasks. For example, there are still difficulties in low-dose CT scan image denoising in terms of preservation of image quality and details, validation of algorithms, and clinical acceptance.³⁰ The processing of bone marrow smear images has the same challenges. Therefore, future work should consider using more diverse clinical datasets and enhancing the denoising preprocessing of clinical data to improve the adaptability of the model in different clinical settings. Due to the particularly high nucleoplasmic ratio in lymphoblast, the high pixel connectivity between nuclei regions in some cell images can result in either minimal gaps between nuclei or complete regional connectivity, making it challenging to accurately segment individual nuclei. When there is sufficient spacing between nuclei, the model achieves good segmentation performance (Figure 13A). However, as the pixel connectivity increases and the gaps become smaller, the segmentation task becomes more difficult due to the close proximity of neighboring nuclei (Figure 13B). At the same time, the accuracy of segmentation decreases due to inconsistent staining and the appearance of granules at the kernel boundaries. Therefore, future studies should further refine the segmentation technique for multicellular images and overlapping cells.

Figure 13 Visualization of different segmentation results for MSFF-SegNeXt. (A) Excellent segmentation results, (B) Segmentation results with the poor edge.

The MSFF-SegNeXt model proposed in this study provides a new idea for cellular analysis in B-ALL images, and its application can be further extended in the future. For example, the feature extraction method of the model can be combined with other data, such as genetic data, clinical data, and so on. Moreover, recent studies show a growing trend where models are no longer limited to single-task learning but are moving toward multitasking, integrating both classification and segmentation into a single model. For instance, Alqarafi et al³¹ proposed the Multi-scale GC-T2 model for skin disease detection, which integrates both segmentation and classification tasks. The model employs the Advanced Deep Q Network (AdDNet) and HAar-U-Net (HAUNT) to achieve precise segmentation of lesion areas. The segmented images are subsequently processed using a multi-scale graph convolution network (M-GCN) for feature extraction. The tri-movement attention mechanism further refines the feature representation by capturing fine-grained, multi-scale features of skin lesions. Finally, the model integrates these features through a tri-level feature fusion module and connects to a classifier, enabling binary classification of skin cancer. Similarly, the XAI-RACapsNet model proposed by Alhussen et al³² also integrates segmentation and classification tasks. It first uses XAI O-Net for ROI segmentation, followed by RACapsNet for breast cancer classification and relevance heat map generation, enhancing model interpretability.

Future studies will focus not only on applying multitasking models to B-ALL bone marrow smear images for precise cell segmentation and classification, but also on improving the segmentation of overlapping cells. Addressing both aspects is crucial for enhancing the accuracy and reliability of cell segmentation, ultimately advancing the analysis of complex cellular structures in medical image processing.

Conclusion

In conclusion, the Multi-scale Feature Fusion-SegNeXt (MSFF-SegNeXt) model developed in this study demonstrates excellent performance in the task of cell nucleus segmentation in B-Lineage Acute Lymphoblastic Leukemia (B-ALL) bone marrow smear images. The proposed model is able to efficiently capture the complex structure of the nuclei and provide high-precision segmentation results in multiple datasets. The model has great potential for application in real clinical settings, helping to improve the diagnostic efficiency of B-ALL and enabling finer analysis of cell nuclei. Future studies should focus on the model’s generalization ability and the feasibility of real-time application to promote its widespread adoption in clinical practice.

Data Sharing Statement

The model code used in this study has been made publicly available on GitHub at the following link: https://github.com/tomatocsi/MSFF-SegNeXt.

Ethical Approval

Issued by the Health Commission Ministry of Education, Ministry of Science and Technology, Bureau of Traditional Chinese Medicine dated 18th Feb 2023. In this study, we validate the performance of our proposed MSFF-SegNeXt on three public open-source datasets. The study did not involve human subjects and did not create additional risks or discomforts due to the research involvement. As a result, ethical approval was not required for the study.

Funding

Grants were received from the key specialized research and development breakthrough of Henan province (Grant No. 232102211016), and the key scientific research projects of Henan colleges and universities (Grant No. 23A416004).

Disclosure

The authors report no conflicts of interest in this work.

References

1. McCredie M, Williams S, Coates M, Cancer mortality in migrants from the British Isles and continental Europe to New South Wales, Australia, 1975-1995[J]. Int, J, Cancer. 1999;83:179–185. doi:10.1002/(SICI)1097-0215(19991008)83:2<179::AID-IJC6>3.0.CO;2-1

2. Lv Z, Cao X, Jin X, Xu S, Deng H. High-accuracy morphological identification of bone marrow cells using deep learning-based Morphogo system. Sci Rep. 2023;13(1):13364. PMID: 37591969. doi:10.1038/s41598-023-40424-x

3. Mourya S, Kant S, Kumar P, Gupta A, Gupta R (2019). ALL Challenge dataset of ISBI 2019 (C-NMC 2019) (Version 1) [dataset]. Can Imag Archive. doi:10.7937/tcia.2019.dc64i46r

4. Gupta A, Gupta R. SN-AM Dataset: white Blood Cancer Dataset of B-ALL and MM for Stain Normalization [Data set]. Cancer Imaging Arch. 2019. doi:10.7937/tcia.2019.of2w8lxr

5. A KA, K MR, Thirunavukkarasu U, et al. D 2 PAM: epileptic seizures prediction using adversarial deep dual patch attention mechanism. CAAI Transacti Intell Techn. 2023;8(3):755–769. doi:10.1049/cit2.12261

6. A KA, K MR, Perumal K, et al. Dual-3DM 3-AD: mixed transformer based semantic segmentation and triplet pre-processing for early multi-class Alzheimer’s diagnosis[J]. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2024.

7. Hollandi R, Szkalisity A, Toth T, et al. A deep learning framework for nucleus segmentation using image style transfer[J]. Biorxiv. 2019;580605.

8. Long F. Microscopy cell nuclei segmentation with enhanced U-Net[J]. BMC Bioinf. 2020;21(1):8. doi:10.1186/s12859-019-3332-1

9. Ghane N, Vard A, Talebi A, Nematollahy, Nematollahy P. Pardis2 Segmentation of White Blood Cells From Microscopic Images Using a Novel Combination of K-Means Clustering and Modified Watershed Algorithm. Journal of Medical Signals & Sensors. 2017;7(2):92–101. doi:10.4103/2228-7477.205503

10. Zheng X, Wang Y, Liu J, Liu J. Fast and robust segmentation of white blood cell images by self-supervised learning. Micron. 2018;107:55–71. doi:10.1016/j.micron.2018.01.010

11. Lu Y, Qin X, Fan H, et al. WBC-Net: a white blood cell segmentation network based on UNet++ and ResNet[J]. Appl. Soft Comput. 2021;101:107006. doi:10.1016/j.asoc.2020.107006

12. Li Y, Zhu R, Mi L, et al. Segmentation of White Blood Cell from Acute Lymphoblastic Leukemia Images Using Dual-Threshold Method[J]. Comput Math Method Med 2016;20169514707.

13. Praveena S, Singh SP. Sparse-FCM and Deep Convolutional Neural Network for the segmentation and classification of acute lymphoblastic leukaemia. Biomed Engin Biomedizinische Technik. 2020;65(6):759–773. doi:10.1515/bmt-2018-0213

14. Abdeldaim AM. Computer-aided acute lymphoblastic leukemia diagnosis system based on image analysis. Advan Soft Comput Mach Learn Image Proc. 2018;131–147.

15. Jha KK, Sekhar Dutta H. Nucleus and cytoplasm–based segmentation and actor-critic neural network for acute lymphocytic leukaemia detection in single cell blood smear images. Med Biol Eng Comput. 2020;58:171–186. doi:10.1007/s11517-019-02071-1

16. Acharya V, Kumar P. Detection of acute lymphoblastic leukemia using image segmentation and data mining algorithms[J]. Med Biol Eng Comput. 2019;57:1783–1811. doi:10.1007/s11517-019-01984-1

17. C JC, Goodman A, K KW, et al. Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl.[J]. Nature Meth. 2019;16(12):1247–1253. doi:10.1038/s41592-019-0612-7

18. Ghosh A, Singh S, Sheet D, ”Simultaneous localization and classification of acute lymphoblastic leukemic cells in peripheral blood smears using a deep convolutional network with average pooling layer” 2017 IEEE International Conference on Industrial and Information Systems (ICIIS), Peradeniya, Sri Lanka, 2017, pp. 1–6, doi: 10.1109/ICIINFS.2017.8300425.

19. S A-JS, A SNN, Chuprat S, et al. Acute lymphoblastic leukemia segmentation using local pixel information[J]. Pattern Recognit Lett. 2019;125:85–90. doi:10.1016/j.patrec.2019.03.024

20. Duggal R, Gupta A, Gupta R, et al. Overlap** cell nuclei segmentation in microscopic images using deep belief networks[C]. Proceedings of the tenth Indian conference on computer vision, graphics and image processing. 2016:1–8.

21. Labati RD, Piuri V, Scotti FA-IDB: The acute lymphoblastic leukemia image database for image processing[C]//2011 18th IEEE international conference on image processing. IEEE, 2011: 2045–2048.

22. Guo M, Lu C, Hou Q, Liu Z, Cheng M, Hu S. SegNeXt: rethinking Convolutional Attention Design for Semantic Segmentation. ArXiv. 2022;2209:08575.

23. Geng Z, Guo MH, Chen H, Li X, Wei K, Lin Z. Is attention better than matrix decomposition? Int Conf Learn Repres. 2021.

24. Bui N-T, Hoang D-H, Nguyen Q-T, Tran M-T, Ngan L. MEGANet: multi-Scale Edge-Guided Attention Network for Weak Boundary Polyp Segmentation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. doi:10.48550/arXiv.2309.03329

25. Ronneberger O, Fischer P, Brox T U-net: convolutional networks for biomedical image segmentation[C]//Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, proceedings, part III 18, Munich, Germany. Springer International Publishing, 2015: 234–241.

26. Z AM, Hasan M, Yakopcic C, et al. Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation[J]. arXiv preprint. 2018.

27. C CL, Papandreou G, Schroff F, et al. Rethinking atrous convolution for semantic image segmentation. arxiv[J]. arXiv preprint. 2017;5.

28. M KZ, Saghari S, Tavakoli S, et al. A large dataset of white blood cells containing cell locations and types, along with segmented nuclei and cytoplasm[J]. Sci Rep. 2022;12(1):1123. doi:10.1038/s41598-021-04426-x

29. Tavakoli S, Ghaffari A, M KZ, et al. New segmentation and feature extraction algorithm for classification of white blood cells in peripheral smear images[J]. Sci Rep. 2021;11(1):19428. doi:10.1038/s41598-021-98599-0

30. Zubair M, Rais HM, Al-Tashi Q, et al. Enabling Predication of the Deep Learning Algorithms for Low-Dose CT Scan Image Denoising Models: a Systematic Literature Review[J]. IEEE Access, 2024.

31. Alqarafi A, Khan AA, K MR, et al. Multi-scale GC-T2: automated region of interest assisted skin cancer detection using multi-scale graph convolution and tri-movement based attention mechanism[J]. Biomed Signal Proce Control. 2024;95:106313. doi:10.1016/j.bspc.2024.106313

32. Alhussen A, Haq MA, Khan AA, et al. XAI-RACapsNet: relevance aware capsule network-based breast cancer detection using mammography images via explainability O-net ROI segmentation[J]. Expert Syst Appl. 2024;125461.

Creative Commons License © 2024 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, 3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]