Back to Journals » Journal of Hepatocellular Carcinoma » Volume 12
A Machine Learning Model for Predicting Prognosis in HCC Patients With Diabetes After TACE
Authors Wu L , Chen L , Zhang L , Liu Y, Ouyang D, Wu W, Lei Y, Han P, Zhao H , Zheng C
Received 16 September 2024
Accepted for publication 10 January 2025
Published 21 January 2025 Volume 2025:12 Pages 77—91
DOI https://doi.org/10.2147/JHC.S496481
Checked for plagiarism Yes
Review by Single anonymous peer review
Peer reviewer comments 2
Editor who approved publication: Dr Ali Hosni
Linxia Wu,1– 3,* Lei Chen,1– 3,* Lijie Zhang,1,4,* Yiming Liu,1,5,* Die Ouyang,1 Wenlong Wu,5,6 Yu Lei,5,6 Ping Han,1 Huangxuan Zhao,1– 3 Chuansheng Zheng1– 3
1Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430022, People’s Republic of China; 2Department of Interventional Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430022, People’s Republic of China; 3Hubei Provincial Clinical Research Center for Precision Radiology & Interventional Medicine, Wuhan, Hubei Province, 430022, People’s Republic of China; 4Department of Interventional Radiology, The Fifth Medical Center of Chinese, PLA General Hospital, Beijing, 100039, People’s Republic of China; 5Department of Interventional Radiology, Auto Valley Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430056, People’s Republic of China; 6Department of Interventional Radiology, Jinyinhu Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430048, People’s Republic of China
*These authors contributed equally to this work
Correspondence: Chuansheng Zheng; Huangxuan Zhao, Department of Radiology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei Province, 430022, People’s Republic of China, Email [email protected]; [email protected]
Purpose: Type II diabetes mellitus (T2DM) has been found to increase the mortality of patients with hepatocellular carcinoma (HCC). Therefore, this study aimed to establish and validate a machine learning-based explainable prediction model of prognosis in patients with HCC and T2DM undergoing transarterial chemoembolization (TACE).
Patients and Methods: The prediction model was developed using data from the derivation cohort comprising patients from three medical centers, followed by external validation utilizing patient data extracted from another center. Further, five predictive models were employed to establish prognosis models for 1-, 2-, and 3-year survival, respectively. Prediction performance was assessed by the receiver operating characteristic (ROC), calibration, and decision curve analysis curves. Lastly, the SHapley Additive exPlanations (SHAP) method was used to interpret the final ML model.
Results: A total of 636 patients were included. Thirteen variables were selected for the model development. The final random survival forest (RSF) model exhibited excellent performance in the internal validation cohort, with areas under the ROC curve (AUCs) of 0.824, 0.853, and 0.810 in the 1-, 2-, and 3-year survival groups, respectively. This model also demonstrated remarkable discrimination in the external validation cohort, achieving AUCs of 0.862, 0.815, and 0.798 in the 1-, 2-, and 3-year survival groups, respectively. SHAP summary plots were also created to interpret the RSF model.
Conclusion: An RSF model with good predictive performance was developed by incorporating simple parameters. This prognostic prediction model may assist physicians in early clinical intervention and improve prognosis outcomes in patients with HCC and comorbid T2DM after TACE.
Keywords: hepatocellular carcinoma, type II diabetes mellitus, transarterial chemoembolization, overall survival, SHapley Additive exPlanations
Introduction
Liver cancer is among the cancers with the highest morbidity and mortality rates worldwide.1 Hepatocellular carcinoma (HCC) is the most prevalent pathological subtype of liver cancer, accounting for over 75% of all liver cancers.2 Although the incidence of HCC has decreased in several Asian countries, its prognosis remains poor. In terms of therapeutic interventions, transarterial chemoembolization (TACE) is one of the crucial treatment modalities for HCC. Several guidelines recommend TACE as the first-line treatment for patients unsuitable for resection, including early-stage patients who cannot undergo hepatic resection, liver transplantation, or radiofrequency ablation (RFA).3–5 However, patients who undergo TACE are prone to tumor recurrence or metastasis, with one of the primary reasons being that incomplete embolization can induce a hypoxic microenvironment in the tumor, which in turn promotes tumor neovascularization and consequently tumor progression.6 Similar findings of hypoxic microenvironment and accelerated tumor progression have been observed in patients with HCC and type II diabetes mellitus (T2DM), thereby leading to poor prognosis in this population.7,8
Diabetes mellitus is a relatively common condition in population.9 Previous studies have shown that diabetes can result in increased cancer morbidity and worsened survival in patients with HCC.8,10 Moreover, diabetes-induced hyperglycemia in patients undergoing TACE for HCC can augment the metabolism of the Wnt/β-signaling pathway. Subsequently, this alteration may induce tumor development in hypoxic tumor cells, ultimately fostering tumor growth, proliferation, and progression.11,12 Therefore, survival prognosis remains a concern in patients with combined HCC and T2DM and treated with TACE, with multiple factors suggested to influence prognosis in this specific population. Several prior studies have utilized preoperative limiting factors to explore the prognosis of patients with HCC undergoing TACE to provide better clinical treatment.13–16 However, these investigations employed preoperative limited factors of TACE for survival prediction, which might not fully capture the efficacy of TACE in the patients. Therefore, identifying more comprehensive predictive modalities is essential to predict patient outcomes following TACE therapy.
Machine learning (ML) offers significant advantages in predicting post-treatment outcomes for cancer patients. ML models can handle large, complex, and heterogeneous datasets, incorporating a wide range of variables, including clinical, pathological, and imaging features, to provide more comprehensive and personalized prognosis predictions.17,18 Li J et al included 288 patients with HCC to develop and validate a nomogram for prediction of acute liver function deterioration (ALFD) after drug-eluting beads TACE. In the validation cohorts, the nomogram demonstrated promising prediction for ALFD with AUC of 0.878.19 Zhang MQ et al used radiomics features and clinical information by four machine learning algorithms to predict the prognosis of patients with HCC who have been treated with TACE. They found that extreme gradient boosting (XGBoost) exhibited the best performance in combined models, achieving an AUC of 0.979 in the training set and 0.750 in the testing set, demonstrating its strong prognostic prediction capability.20 Another study using radiomics and clinicoradiomics features to create machine learning models and evaluate the ability of the models to predict local response of patients with HCC after TACE. They found machine learning models created with clinical and MRI radiomics features, may have promise in predicting local response after TACE.21 However, most ML models function as “black boxes” that fail to explain the specific steps involved in their survival prediction among patients with tumors, thereby hindering the targeting of precise features for the analysis and guidance of specific medication.22 Recently, the SHapley Additive exPlanations (SHAP) method has emerged as an interpretable ML model that can elucidate model output by decomposing the prediction results into the effects of individual features.23 In the case of survival prediction in patients with tumors, this approach can determine the weight of each predictive variable on patient prognosis. Consequently, the SHAP method enables clinicians to focus on the strong-weighted variables, thus resulting in a more rational treatment strategy for these patients. Although several researchers have employed the SHAP model to predict the outcomes of patients with HCC after receiving TACE,24,25 no studies utilizing the SHAP model have been conducted to evaluate prognosis prediction in patients with combined HCC and T2DM following TACE therapy.
In this study, we retrospectively enrolled 636 patients with HCC and comorbid T2DM who underwent TACE treatment at four medical centers. Furthermore, five prediction models, including random survival forest (RSF), extreme gradient boosting (XGBoost), gradient boosting machine (GBM), ML techniques-extended Cox proportional hazards (CoxPH), and recursive partitioning and regression trees (Rpart), were used to determine the factors affecting patient survival at 1, 2, and 3 years after TACE according to the preoperative biochemical and imaging characteristics of the patients. Subsequently, the SHAP method was applied to the best predictive mode to provide transparent predictions for its clinical application. Moreover, we developed a web application to allow healthcare professionals to generate survival probabilities for individual patients at 1, 2, and 3 years post-TACE treatment based on the preoperative profile of each patient.
Materials and Methods
Study Population
This retrospective study included 636 patients with HCC and T2DM who underwent TACE at four medical centers between January 2014 and June 2023. Two independent patient cohorts were used in this study. The first cohort was a primary derivation cohort comprising patients from three medical centers in China, which was further split into a training and internal validation set and utilized for hyperparameter optimization and model selection, respectively. The primary derivation cohort consisted of 542 patients with combined HCC and T2DM who received TACE at centers 1, 2, and 3. The second cohort was an external validation cohort solely employed to evaluate the final model. The data of this cohort were obtained from center 4, with information collected from 97 patients with HCC and comorbid T2DM who underwent TACE between January 2014 and June 2023. The patient inclusion and exclusion criteria were identical between the primary derivation and external validation cohorts. This study protocol was reviewed and approved by the Ethical Committee of the Hospital. Patient consent was waived owing to the retrospective design of this study. The data was anonymized or maintained with confidentiality. The study was conducted in compliance with the Declaration of Helsinki.26
The patient inclusion criteria were as follows: (1) HCC diagnosis according to pathological biopsy or imaging (CT/MRI/ultrasound) results;5,27 (2) T2DM diagnosis verified via medical records and based on fasting plasma glucose level or A1C criteria in accordance with the American Diabetes Association guidelines;28 (3) underwent initial TACE therapy; (4) classification into Child–Pugh class A or B; and (5) achievement of Eastern Cooperative Oncology Group (ECOG) scores of 0 or 1. The patients were excluded if they met any of the following criteria: (1) diagnosed with type I or other forms of diabetes; (2) presence of pre-existing concurrent cancers; (3) underwent systemic treatments before inclusion in this study; or (4) lost to follow-up (Figure 1).
![]() |
Figure 1 Flow chart of the patient selection process in this study. |
Treatment
The decision to perform TACE for the patients was deliberated by a multidisciplinary committee of experts (radiologists, interventional radiologists, hepatobiliary surgeons, and gastroenterologists). Ultimately, the committee recommended that patients diagnosed with Barcelona Clinic Liver Cancer (BCLC) stage A HCC and who were not suitable for liver resection or transplantation or RFA should undergo TACE. Three highly experienced interventional radiologists, each with over 10 years of expertise in interventional therapy, conducted the TACE procedures in all patients. The TACE treatment performed in this study has been extensively described in our previous research.29 Briefly, percutaneous arterial puncture was conducted using the Seldinger method, followed by introducing a 5F catheter (Cook, Bloomington, IN, USA) into the hepatic artery. Arteriography was then performed to visualize the tumor-feeding arteries, and a 3F microcatheter (Progreat, Terumo, Tokyo, Japan) was inserted into the tumor-feeding arteries through the 5F catheter. Subsequently, a mixture of lipiodol (Lipiodol Ultrafluido, Guerbet, Villepinte, France) and doxorubicin hydrochloride was emulsified at a ratio of 1 mL:2 mg. Considering the tumor size and liver function, 5–20 mL of the emulsion was slowly injected into the tumor-feeding arteries until stasis occurred. Supplemental embolization with gelatin sponge particles (300–700 μm, Cook, USA) was conducted if required.
Image Acquisition and Evaluation
All patients underwent enhanced CT or MRI scans before TACE. The features of tumor size, tumor number, liver cirrhosis, ascites, portal vein thrombus, and extrahepatic metastases were evaluated in a blinded manner by two radiologists having 8 and 10 years of experience. In the case of inconsistent results between the two radiologists, a third radiologist with 25 years of experience was asked to re-evaluate the results.
Study Predictors and Outcome
This study encompassed clinically pertinent characteristics of patients with HCC and T2DM collected before TACE surgery. Among the eligible patients, 32 predictors, including the clinical characteristics, laboratory parameters, and image findings, were retrospectively collected from the patients’ medical records. The predicted outcome was overall survival (OS), which was defined as the duration from the surgery date to the date of death or last follow-up. The predictive ability of the model was assessed at 1, 2, and 3 years.
Feature Selection and Model Development
First, eligible patients from the primary derivation cohort were randomly divided into a training (n = 373) and an internal validation set (n = 166) at a 7:3 ratio. Second, Spearman correlation tests were performed in the training set to avoid significant multicollinearity among all variables before feature selection (Figure S1). Third, Boruta algorithm with 1000 bootstrap repetitions was applied in the training set to identify features important to the prediction target by comparing the original features with randomly generated shadow features.30 Fourth, variance inflation factor (VIF) tests were conducted on the selected features to assess multicollinearity among the variables within the model. Fifth, the selected important features were entered into five commonly used ML models (including XGBoost, GBM, CoxPH, and Rpart) for modeling in the training cohort. Optimal hyperparameters for each ML model were obtained through random search and manual fine-tuning with 5-fold cross-validation.
The models were then evaluated using the concordance index (C-index), area under curve (AUC), and Brier score of the internal validation cohort, with the best-performing model selected for further analysis. The C-index represented the proportion of correctly distinguished pairs out of all possible pairs. The Brier score was employed to evaluate model calibration. This metric demonstrates the difference between the actual survival status and the predicted survival probability of a patient, with a lower score signifying better calibration performance. Lastly, calibration and decision curve analysis curves were plotted to assess the clinical utility of the optimal model.
Model Explanation
Model transparency is crucial for its application in the medical field. Therefore, we enhanced the transparency of the optimal model by implementing SHAP, a model-agnostic post-hoc interpretability algorithm widely used to explain ML models.23 In this method, the SHAP values were calculated for each feature, and the features were ranked according to their importance.
Patient Assessment and Follow-up
All patients underwent clinical evaluation, liver function test, and abdominal CT or MRI scan (or both) at baseline, every 2 months in the first 6 months following initial TACE, and every 3–4 months after that.
Statistical Methods
The normality of continuous variables was assessed using the Shapiro–Wilk test. Continuous variables with normal distribution were expressed as mean ± standard deviation (SD) and compared utilizing the independent samples t-test. Continuous variables with non-normal distribution were described as median (interquartile range) and compared with the Mann–Whitney U-test. Categorical variables were presented as number (n) and percentage (%), and group differences were assessed by conducting the chi-square test. All statistical analyses were performed using R software 4.1.3 (https://www.r-project.org/), with a two-sided p-value of <0.05 considered statistically significant.
Results
Clinicopathological Characteristics of the Study Patients
A total of 636 patients with HCC and T2DM who underwent TACE were enrolled after applying the study inclusion and exclusion criteria (Figure 1). The baseline characteristics of the training (n = 373), internal validation (n = 166), and external validation cohorts (n = 97) are presented in Table 1. In the training cohort, the median age was 59.0 years, and the median body mass index (BMI) was 24.1 kg/m2. Further, 82.3% were male, and the diabetes duration was 46.0 months. The patients in the internal validation cohort had a median age of 60.0 years and a median BMI of 23.5 kg/m2, while 83.1% were male and their diabetes duration was 48.0 months. In the case of the external validation cohort, the mean age was 61.0 years, and the median BMI was 23.8 kg/m2. Additionally, the proportion of male patients was 76.3%, and the diabetes duration was 43.0 months. The median follow-up time of OS in the training, internal validation, and external validation cohorts was 27, 27, and 28 months, respectively. Lastly, the median OS was 31, 29, and 28 months in the training, internal validation, and external validation cohorts, respectively.
![]() |
Table 1 Baseline Characteristics of the Patients in the Training, Internal Validation, and External Validation Cohorts |
Data Preprocessing, Feature Selection, and Collinearity Tests
Features with more than 30% missing data were excluded31 because they provided limited information, whereas the missForest method was used to impute the remaining missing values of the continuous and categorical variable datasets.32 Finally, 31 variables were identified as candidate predictors, and further analysis was performed using the following three steps. First, the correlations between different variables were examined by applying the Spearman correlation analysis to prevent overfitting or uncertainty in the model before its development stage (Figure S1). In this analysis, high collinearity was observed between alanine aminotransferase (ALT) and aspartate aminotransferase (AST), and the feature exhibiting comparatively lower correlation with the outcome (ie, AST) was eliminated. Second, the Boruta algorithm was employed for feature screening. As depicted in the feature selection results in Figure 2 and Table S1, the variables in the orange region are considered important features, and those in the blue area represent unimportant features in the Boruta algorithm. Ultimately, 13 features were selected for predicting OS of the patients, including BCLC stage, portal vein tumor thrombosis (PVTT), tumor size, extrahepatic metastases, metformin use, liver cirrhosis, alpha-fetoprotein (AFP) level, platelet-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), glycosylated hemoglobin (HbA1c) level, hemoglobin level, bilirubin level, and prothrombin time (PT). Third, VIF tests demonstrated no strong correlation or multicollinearity between the selected features (Table S2), and all 13 variables were utilized for subsequent model development.
Model Development and Optimal Model Selection
Five survival analysis algorithms (ie, RSF, XGBoost, GBM, CoxPH, and Rpart) were used for model development in the training set. The hyperparametric search space and tuning results are presented in Table S3. The C-index values of the RSF, XGBoost, GBM, CoxPH, and Rpart models were 0.797, 0.841, 0.828, 0.770, and 0.758, respectively, in the training set, while these values were 0.754, 0.734, 0.743, 0.746, and 0.707, respectively, in the internal validation cohort and 0.759, 0.752, 0.758, 0.736, and 0.711, respectively, in the external validation cohort (Table 2 and Figure S2). The AUCs of the five final ML models (RSF, XGBoost, GBM, CoxPH, and Rpart) at 1, 2, and 3 years ranged from 0.832 to 0.918 in the training cohort. Similarly, the AUCs of the final models in the internal validation and external validation cohorts ranged from 0.746 to 0.853 and from 0.746 to 0.862, respectively (Table 3). The Brier scores of the five final ML models across 1, 2, and 3 years spanned from 0.098 to 0.308 in the training cohort, 0.130 to 0.325 in the internal validation, and 0.142 to 0.343 in the external validation cohort (Table 4).
![]() |
Table 2 C-Index Values of the Five Machine Learning-Based Prognostic Models in the Training, Internal Validation, and External Validation Cohorts |
![]() |
Table 3 AUCs of the Five Machine Learning-Based Prognostic Models in the Training, Internal Validation, and External Validation Cohorts |
![]() |
Table 4 Brier Score of Prognostic Models Built by Machine Learning Algorithms |
The C-index, AUC, and Brier score of the internal cohort were used to evaluate the discriminative performance of the developed models. As illustrated in Figure S3, the RSF model exhibited the best performance with a C-index of 0.754 (Figure S3A and Table 2) in the internal validation cohort. Moreover, the RSF model exhibited the highest AUC (Figure S3B and Table 3) and the lowest Brier score (Figure S3C and Table 4) among all models. Therefore, the RSF model demonstrated superior discriminative ability and calibration performance compared to the other four models. The RSF model also provided the best clinical benefits (Figure S4). Based on all these findings, the RSF model was further investigated.
Performance of the Final RSF Model
The RSF model showed good performance in the training cohort, achieving AUC values of 0.912, 0.895, and 0.893 for 1-, 2-, and 3-year OS, respectively (Figure 3A). Furthermore, the AUC values for 1-, 2-, and 3-year OS in the internal and external validation cohorts were 0.824, 0.853, and 0.810 (Figure 3B) and 0.862, 0.815, and 0.798 (Figure 3C), respectively.
Additionally, the calibration curves of the RSF model displayed good consistency between the actual observations and model predictions for 1-, 2-, and 3-year OS in the training, internal validation, and external validation cohorts (Figure 4A–C; respectively). Correspondingly, the decision curve analysis curves of the RSF model for 1-, 2-, and 3-year OS in the internal (Figure 5A–C; respectively) and external validation cohorts (Figure 5D–F; respectively) also exhibited good clinical utility, indicating a favorable positive net benefit.
Feature Importance Assessment of the Final RSF Model
Next, the SHAP method was applied to examine the impact of input features on the survival of patients with HCC and T2DM after undergoing TACE. Figure 6A illustrates the ranking of feature importance according to their SHAP values, with features arranged from top to bottom in decreasing order of importance. In particular, a greater mean absolute SHAP value for a feature implies greater feature importance. The top five features ranked according to their importance included BCLC stage, PVTT, tumor size, metformin use, and extrahepatic metastases. The SHAP summary plot depicts the full distribution of each feature’s impact on the model output. The colors of the dots denote the actual SHAP values of the features for each patient, where blue corresponds to a higher feature value and red represents a lower feature value. Additionally, the greater the distance of a point from the baseline SHAP value of 0, the stronger the effect of the corresponding feature on the output. As demonstrated in the SHAP summary dot plot in Figure 6B, higher BCLC stage, the presence of PVTT, larger tumor size, the absence of metformin use, and extrahepatic metastases were related to higher mortality risk.
Development of a Web Application for the Final RSF Model
The clinical application of the RSF model was facilitated by developing a user-friendly web application based on the Shiny platform. The final predictive model was implemented into the web application, as shown in Figure S5. After entering the values of the 13 required features, the web application can automatically predict the 1-, 2-, and 3-year survival probability of individual patients with HCC and T2DM following TACE based on their feature information. If the probability is greater than 0.5, the patient is predicted to survive; if it is less than 0.5, the patient is predicted to die. For instance, in the case shown in Figure S5A, the model predicted survival probabilities of 0.35, 0.132, and 0.063 for 1-, 2-, and 3-year survival, respectively, for a patient with 7-month survival. In another case shown in Figure S5B, for a patient with 42-month survival, the model predicted survival probabilities of 0.959, 0.854, and 0.734 for the same time points. The web application is accessible online at https://osprediction.shinyapps.io/survival_1y_2y_3y_prediction/, and detailed instructions for use are provided in the Table S4.
Discussion
Survival prediction in patients with tumors can lead to the selection of better treatment strategies by clinicians. Several studies have established that diabetes can result in a poorer prognosis in patients with various tumors.33–35 In line with these findings, diabetes in patients with liver cancer who were treated with TACE was found to be associated with a poorer survival prognosis.36,37 However, no studies have explored the survival factors affecting patients with HCC and comorbid T2DM following TACE therapy. In this study, we developed an ML-based model for predicting the survival of patients with HCC accompanied by T2DM who underwent TACE by incorporating preoperative demographics, biochemical test results, and imaging markers of this population.
Several prior studies have focused on the survival prediction in patients with HCC after receiving TACE.38–40 However, the population heterogeneity of this patient group has made it difficult for a single sub-predictor to fully reflect the survival prognosis. Moreover, diabetes-related factors have been suggested to affect survival prognosis in patients with combined HCC and T2DM. Therefore, utilizing a single indicator for survival prediction in such patients may be relatively more challenging. ML is a powerful computational method that can manage complex and large amounts of data due to its capability of processing highly variable datasets and deciphering complex relationships between variables in a flexible and trainable manner. Therefore, combining multimodal data (demographics, biochemical test results, and imaging features) with sophisticated ML algorithms can facilitate the development of valuable clinical predictive models.41 Among the five ML models evaluated in this study, the RSF model exhibited the best survival prediction at 1, 2, and 3 years after TACE therapy in terms of the AUC values of the internal and validation cohort. Numerous studies have demonstrated that the RSF method has excellent predictive value in patients with cancer.42–44 In contras t to traditional scoring systems, such as the Child-Pugh score or the BCLC staging system,27,45 which rely on relatively limited clinical parameters (eg, liver function, tumor stage, and patient performance), the RSF model incorporates 13 features, representing a broader spectrum of factors. These features can be easily assessed at the time of patient admission, which makes the RSF model both a promising and practical tool for survival prediction in patients with combined HCC and T2DM undergoing TACE. Furthermore, while the traditional predictive models, such as Cox proportional hazards model, are widely used in cancer prognosis, they may struggle with non-linear relationships and complex interactions between multiple factors.46 The RSF model, with its ensemble approach, handles these complexities more effectively, making it particularly well-suited for real-world clinical applications. The development of the 13-feature RSF model not only highlights its predictive power but also introduces a novel approach for personalizing clinical decision-making for patients with the dual burden of HCC and T2DM.
Given the lack of studies on generative prediction models for patients with combined HCC and T2DM who are treated with TACE, no relevant guidelines or consensus are available on the selection of specific diagnostic features for prediction models. Thus, determining the number of features to be incorporated in the survival prediction model is challenging. Although including more modalities and baseline characteristics of patients may provide additional information for the prediction model, integrating a large number of features, particularly those with strong correlations, can cause model overfitting and produce unreliable model results. Moreover, combining an excessive number of specific diagnostic features in the prediction model may limit its clinical use, while including non-causal features may diminish prediction accuracy. In this study, the Spearman correlation analysis and the Boruta algorithm were applied to screen the diagnostic features, ultimately identifying 13 variables. Furthermore, the randomized forest (missForest) method was implemented to impute the missing values of the relevant variables of the patients to establish a simple and convenient ML prediction model. The final model can be used for survival prediction in patients with HCC and comorbid T2DM following TACE treatment, thereby aiding in the clinical decision-making process for this patient group.
Previous studies have reported that factors such as preoperative BCLC grade, tumor size, tumor number, and AFP level can influence the survival of patients with HCC who have undergone TACE.27,47,48 In this study, the 13 variables that influenced patient survival included non-diabetes-related variables similar to those identified in previous studies. Nevertheless, diabetes-associated variables, such as metformin use and HbA1c level, also had a high weightage on the prediction of survival prognosis, suggesting the significance of these variables in patients with combined HCC and T2DM. Correspondingly, earlier research has demonstrated that metformin use in patients with HCC and T2DM who received TACE may lead to improved prognosis.49,50 The current study revealed consistent results, indicating that a preoperative intervention for T2DM using metformin therapy may enhance survival prognosis in this patient population. Furthermore, this study employed the SHAP method to attach different weights to each variable affecting the survival prognosis of the patients. The results showed that BCLC stage, PVTT, tumor size, extrahepatic metastases, liver cirrhosis, AFP level, PLR, NLR, hemoglobin level, bilirubin level, and PT influenced patient survival prognosis. Along with their use for predicting patient survival, these factors may also guide clinicians in administering more appropriate treatments. Thus, clinicians can potentially extend patient survival by effectively managing these pre-TACE variables.
The ML model developed in this study and its corresponding web application provide quantitative and personalized survival probability predictions for patients with HCC and comorbid T2DM after TACE treatment. This study model marks a remarkable advancement from the traditional approach that generalizes risks based on heterogeneous population averages. The predictive model can also assist clinicians in determining a patient’s risk for adverse survival outcomes, prioritizing treatments, and developing treatment strategies. For example, the model can provide early predictions of potential survival probabilities at specified intervals, aiding in future care planning. Therefore, this study model can support the development of policies and procedures for enhancing survival outcomes, optimizing resource utilization, and improving prognosis for patients at varied risk levels.
However, our study has several limitations that should be considered. This study was a retrospective investigation; thus, selective bias may be inevitably present. Moreover, considering that a few enrolled patients underwent surgical resection treatment, this study did not assess pathological examination results or genetic testing data. Third, machine learning models typically require large sample sizes to achieve robust performance. Therefore, we plan to expand the sample size in future studies to further validate the stability and generalizability of the model. Finally, future studies should incorporate these variables to extend the survival prediction research in patients with HCC and T2DM receiving TACE.
Conclusion
In this study, we successfully developed an interpretable ML model that predicts 1-, 2-, and 3-year survival probabilities of patients undergoing TACE based on their pre-TACE-related information. The final RSF model exhibited excellent predictive ability in both the internal and external validation cohorts. Furthermore, our study results may provide clinicians with valuable guidance in selecting appropriate treatment modalities for patients with combined HCC and T2DM undergoing TACE. Future larger prospective studies are essential to further validate the clinical applicability and robustness of the model across diverse patient populations.
Abbreviations
HCC, hepatocellular carcinoma; T2DM, Type II diabetes mellitus; ML, machine learning; SHAP, SHapley Additive exPlanations; RSF, random survival forest; TACE, transarterial chemoembolization; OS, overall survival; C-index, concordance index; PVTT, portal vein tumor thrombosis; HbA1c, glycosylated hemoglobin.
Data Sharing Statement
The data used in the study can be available from the correspondence author (Chuansheng Zheng, [email protected]).
Ethics Approval and Informed Consent
This study protocol was reviewed and approved by the Ethical Committee of Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, approval number UHCT-IEC-SOP-016-03-03. Patient consent was waived owing to the retrospective design of this study.
Consent for Publication
Written informed consent for publication was obtained from all participants.
Acknowledgments
Thank all the staff authors for their contributions to this study.
Author Contributions
All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation, or in all these areas; took part in drafting, revising or critically reviewing the article; gave final approval of the version to be published; have agreed on the journal to which the article has been submitted; and agree to be accountable for all aspects of the work.
Funding
The study was supported by the National Key Research and Development Program of China (2023YFC2413500 to C.Z.), the National Natural Science Foundation of China (82402411 to L.C, U22A20352 to C.Z., and 82472070 to H.Z), and the Fundamental Research Funds for the Central Universities (YCJJ20242419 to L.Z.).
Disclosure
The authors report no conflicts of interest in this work.
References
1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. 2024;74(3):229–263. doi:10.3322/caac.21834
2. Llovet JM, Kelley RK, Villanueva A, et al. Hepatocellular carcinoma. Nature Reviews Disease Primers. 2021;7(1):6. doi:10.1038/s41572-020-00240-3
3. Llovet JM, Real MI, Montaña X, et al. Arterial embolisation or chemoembolisation versus symptomatic treatment in patients with unresectable hepatocellular carcinoma: a randomised controlled trial. Lancet. 2002;359(9319):1734–1739. doi:10.1016/S0140-6736(02)08649-X
4. European Association for the Study of the Liver. EASL Clinical Practice Guidelines: management of hepatocellular carcinoma. Journal of Hepatology. 2018;69(1):182–236. doi:10.1016/j.jhep.2018.03.019
5. Zhou J, Sun H, Wang Z, et al. Guidelines for the diagnosis and treatment of Primary Liver Cancer (2022 Edition). Liver Cancer. 2023;12(5):405–444. doi:10.1159/000530495
6. Chang Y, Jeong SW, Young Jang J, Jae Kim Y. Recent updates of transarterial chemoembolilzation in hepatocellular carcinoma. International Journal of Molecular Sciences. 2020;21(21):8165. doi:10.3390/ijms21218165
7. Wang YG, Wang P, Wang B, Fu ZJ, Zhao WJ, Yan SL. Diabetes mellitus and poorer prognosis in hepatocellular carcinoma: a systematic review and meta-analysis. PLoS One. 2014;9(5):e95485. doi:10.1371/journal.pone.0095485
8. Nakatsuka T, Tateishi R. Development and prognosis of hepatocellular carcinoma in patients with diabetes. Clinical and Molecular Hepatology. 2023;29(1):51–64. doi:10.3350/cmh.2022.0095
9. Zheng Y, Ley SH, Hu FB. Global aetiology and epidemiology of type 2 diabetes mellitus and its complications. Nature Reviews Endocrinology. 2018;14(2):88–98. doi:10.1038/nrendo.2017.151
10. Simon TG, King LY, Chong DQ, et al. Diabetes, metabolic comorbidities, and risk of hepatocellular carcinoma: results from two prospective cohort studies. Hepatology. 2018;67(5):1797–1806. doi:10.1002/hep.29660
11. Chocarro-Calvo A, García-Martínez JM, Ardila-González S, De la Vieja A, García-Jiménez C. Glucose-induced β-catenin acetylation enhances Wnt signaling in cancer. Molecular Cell. 2013;49(3):474–486. doi:10.1016/j.molcel.2012.11.022
12. García-Jiménez C, García-Martínez JM, Chocarro-Calvo A, De la Vieja A. A new link between diabetes and cancer: enhanced WNT/β-catenin signaling by high glucose. Journal of Molecular Endocrinology. 2014;52(1):R51–66. doi:10.1530/JME-13-0152
13. Lee IC, Hung YW, Liu CA, et al. A new ALBI-based model to predict survival after transarterial chemoembolization for BCLC stage B hepatocellular carcinoma. Liver International. 2019;39(9):1704–1712. doi:10.1111/liv.14194
14. Mao X, Guo Y, Wen F, Liang H, Sun W, Lu Z. Applying arterial enhancement fraction (AEF) texture features to predict the tumor response in hepatocellular carcinoma (HCC) treated with Transarterial chemoembolization (TACE). Cancer Imaging. 2021;21(1):49. doi:10.1186/s40644-021-00418-2
15. Liu B, Gao S, Guo J, et al. A novel nomogram for predicting the overall survival in patients with unresectable HCC after TACE plus hepatic arterial infusion chemotherapy. Translational Oncology. 2023;34:101705. doi:10.1016/j.tranon.2023.101705
16. Liu K, Zheng X, Lu D, et al. A multi-institutional study to predict the benefits of DEB-TACE and molecular targeted agent sequential therapy in unresectable hepatocellular carcinoma using a radiological-clinical nomogram. La Radiologia medica. 2024;129(1):14–28. doi:10.1007/s11547-023-01736-0
17. Karabacak M, Margetis K. A machine learning-based online prediction tool for predicting short-term postoperative outcomes following spinal tumor resections. Cancers. 2023;15(3):812. doi:10.3390/cancers15030812
18. Karabacak M, Ozkara BB, Ozturk A, et al. Radiomics-based machine learning models for prediction of medulloblastoma subgroups: a systematic review and meta-analysis of the diagnostic test performance. Acta radiologica. 2023;64(5):1994–2003. doi:10.1177/02841851221143496
19. Li J, Zhang Y, Ye H, et al. Machine learning-based development of nomogram for hepatocellular carcinoma to predict acute liver function deterioration after drug-eluting beads transarterial chemoembolization. Acad Radiol. 2023;30(Suppl 1):S40–s52. doi:10.1016/j.acra.2023.05.014
20. Zhang M, Kuang B, Zhang J, et al. Enhancing prognostic prediction in hepatocellular carcinoma post-TACE: a machine learning approach integrating radiomics and clinical features. Front Med. 2024;11:1419058. doi:10.3389/fmed.2024.1419058
21. İnce O, Önder H, Gençtürk M, Cebeci H, Golzarian J, Young S. Machine learning models in prediction of treatment response after chemoembolization with MRI clinicoradiomics features. Cardiovasc Intervent Radiol. 2023;46(12):1732–1742. doi:10.1007/s00270-023-03574-z
22. Watson DS, Krutzinna J, Bruce IN, et al. Clinical applications of machine learning algorithms: beyond the black box. BMJ. 2019;364:l886. doi:10.1136/bmj.l886
23. Lundberg S, Lee SI. A unified approach to interpreting model predictions. arXiv preprint arXiv:1705.07874. 2017.
24. Ma J, Bo Z, Zhao Z, et al. Machine learning to predict the response to lenvatinib combined with transarterial chemoembolization for unresectable hepatocellular carcinoma. Cancers. 2023;15(3):625. doi:10.3390/cancers15030625
25. Zhang L, Jin Z, Li C, et al. An interpretable machine learning model based on contrast-enhanced CT parameters for predicting treatment response to conventional transarterial chemoembolization in patients with hepatocellular carcinoma. Radiol Med. 2024;129(3):353–367. doi:10.1007/s11547-024-01785-z
26. World Medical Association Declaration of Helsinki. Ethical principles for medical research involving human subjects. Nurs Ethics. 2002;9(1):105–109. doi:10.1191/0969733002ne486xx
27. Reig M, Forner A, Rimola J, et al. BCLC strategy for prognosis prediction and treatment recommendation: the 2022 update. J Hepatol. 2022;76(3):681–693. doi:10.1016/j.jhep.2021.11.018
28. American Diabetes Association Professional Practice Committee. Classification and diagnosis of diabetes: standards of medical care in diabetes-2022. Diabetes Care. 2022;45(Suppl 1):S17–s38. doi:10.2337/dc22-S002
29. Chen L, Zhang W, Sun T, et al. Effect of transarterial chemoembolization plus percutaneous ethanol injection or radiofrequency ablation for liver tumors. Journal of Hepatocellular Carcinoma. 2022;9:783–797. doi:10.2147/JHC.S370486
30. Kursa MB, Rudnicki WR. Feature selection with the Boruta Package. Journal of Statistical Software. 2010;36(11):1–13. doi:10.18637/jss.v036.i11
31. Yang X, Qiu H, Wang L, Wang X. Predicting colorectal cancer survival using time-to-event machine learning: retrospective cohort study. J Med Internet Res. 2023;25:e44417. doi:10.2196/44417
32. Ma M, Wan X, Chen Y, et al. A novel explainable online calculator for contrast-induced AKI in diabetics: a multi-centre validation and prospective evaluation study. J Transl Med. 2023;21(1):517. doi:10.1186/s12967-023-04387-x
33. Barone BB, Yeh HC, Snyder CF, et al. Postoperative mortality in cancer patients with preexisting diabetes: systematic review and meta-analysis. Diabetes Care. 2010;33(4):931–939. doi:10.2337/dc09-1721
34. Liu X, Ji J, Sundquist K, Sundquist J, Hemminki K. The impact of type 2 diabetes mellitus on cancer-specific survival: a follow-up study in Sweden. Cancer. 2012;118(5):1353–1361. doi:10.1002/cncr.26420
35. Zelenko Z, Gallagher EJ. Diabetes and cancer. Endocrinology and Metabolism Clinics of North America. 2014;43(1):167–185. doi:10.1016/j.ecl.2013.09.008
36. Liu G, Xia F, Fan G, et al. Type 2 diabetes mellitus worsens the prognosis of intermediate-stage hepatocellular carcinoma after transarterial chemoembolization. Diabetes Research and Clinical Practice. 2020;169:108375. doi:10.1016/j.diabres.2020.108375
37. Moon SJ, Ahn CH, Lee YB, Cho YM. Impact of hyperglycemia on complication and mortality after transarterial chemoembolization for hepatocellular carcinoma. Diabetes & Metabolism Journal. 2024;48(2):302–311. doi:10.4093/dmj.2022.0255
38. Han G, Berhane S, Toyoda H, et al. Prediction of survival among patients receiving transarterial chemoembolization for hepatocellular carcinoma: a response-based approach. Hepatology. 2020;72(1):198–212. doi:10.1002/hep.31022
39. Wang S, Zhang X, Chen Q, Jin ZC, Lu J, Guo J. A novel neutrophil-to-lymphocyte ratio and sarcopenia based TACE-predict model of hepatocellular carcinoma patients. Journal of Hepatocellular Carcinoma. 2023;10:659–671. doi:10.2147/JHC.S407646
40. Li X, Liang X, Li Z, et al. A novel stratification scheme combined with internal arteries in CT imaging for guiding postoperative adjuvant transarterial chemoembolization in hepatocellular carcinoma: a retrospective cohort study. International Journal of Surgery. 2024;110(5):2556–2567. doi:10.1097/JS9.0000000000001191
41. Goecks J, Jalili V, Heiser LM, Gray JW. How machine learning will transform biomedicine. Cell. 2020;181(1):92–101. doi:10.1016/j.cell.2020.03.022
42. Guo L, Wang Z, Du Y, et al. Random-forest algorithm based biomarkers in predicting prognosis in the patients with hepatocellular carcinoma. Cancer Cell International. 2020;20:251. doi:10.1186/s12935-020-01274-z
43. Yu Y, Tan Y, Xie C, et al. Development and validation of a preoperative magnetic resonance imaging radiomics-based signature to predict axillary lymph node metastasis and disease-free survival in patients with early-stage breast cancer. JAMA Network Open. 2020;3(12):e2028086. doi:10.1001/jamanetworkopen.2020.28086
44. Huang B, Sollee J, Luo YH, et al. Prediction of lung malignancy progression and survival with machine learning based on pre-treatment FDG-PET/CT. EBioMedicine. 2022;82:104127. doi:10.1016/j.ebiom.2022.104127
45. European Association for the Study of the Liver. EASL Clinical Practice Guidelines on the management of hepatocellular carcinoma. J Hepatol. 2024.
46. Zhang MJ. Cox proportional hazards regression models for survival data in cancer research. Cancer Treat Res. 2002;113:59–70.
47. Kaewdech A, Sripongpun P, Assawasuwannakit S, et al. FAIL-T (AFP, AST, tumor sIze, ALT, and Tumor number): a model to predict intermediate-stage HCC patients who are not good candidates for TACE. Frontiers in Medicine. 2023;10:1077842. doi:10.3389/fmed.2023.1077842
48. Masior Ł, Krasnodębski M, Kuncewicz M, et al. Alpha-fetoprotein response after first Transarterial Chemoembolization (TACE) and complete pathologic response in patients with hepatocellular cancer. Cancers. 2023;15(15):3962. doi:10.3390/cancers15153962
49. Chen ML, Wu CX, Zhang JB, et al. Transarterial chemoembolization combined with metformin improves the prognosis of hepatocellular carcinoma patients with type 2 diabetes. Frontiers in Endocrinology. 2022;13:996228. doi:10.3389/fendo.2022.996228
50. Jung WJ, Jang S, Choi WJ, et al. Metformin administration is associated with enhanced response to transarterial chemoembolization for hepatocellular carcinoma in type 2 diabetes patients. Scientific Reports. 2022;12(1):14482. doi:10.1038/s41598-022-18341-2
© 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The
full terms of this license are available at https://www.dovepress.com/terms.php
and incorporate the Creative Commons Attribution
- Non Commercial (unported, 3.0) License.
By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted
without any further permission from Dove Medical Press Limited, provided the work is properly
attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.
Recommended articles

Effects of Early TACE Refractoriness on Survival in Patients with Hepatocellular Carcinoma: A Real-World Study
Yang C, Luo YG, Yang HC, Yao ZH, Li X
Journal of Hepatocellular Carcinoma 2022, 9:621-631
Published Date: 21 July 2022

Comparison of the Efficacy and Safety of Transarterial Chemoembolization with or without Lenvatinib for Unresectable Hepatocellular Carcinoma: A Retrospective Propensity Score–Matched Analysis
Chen YX, Zhang JX, Zhou CG, Liu J, Liu S, Shi HB, Zu QQ
Journal of Hepatocellular Carcinoma 2022, 9:685-694
Published Date: 1 August 2022

Retrospective Study of the Efficacy and Safety of Chemoembolization with Drug-Eluting Microspheres Combined with Intra-Arterial Infusion of Bevacizumab for Unresectable Hepatocellular Carcinoma
Ueda S, Hori S, Hori A, Makitani K, Wan K, Sonomura T
Journal of Hepatocellular Carcinoma 2022, 9:973-985
Published Date: 12 September 2022

Evaluation of Lactate Dehydrogenase and Alkaline Phosphatase as Predictive Biomarkers in the Prognosis of Hepatocellular Carcinoma and Development of a New Nomogram
Su K, Huang W, Li X, Xu K, Gu T, Liu Y, Song J, Qian K, Xu Y, Zeng H, Yang Y, Guo L, Han Y
Journal of Hepatocellular Carcinoma 2023, 10:69-79
Published Date: 14 January 2023
Survival Benefits of Transarterial Chemoembolization Plus Ablation Therapy in Patients With Intermediate or Advanced Hepatocellular Carcinoma: A Propensity Score Matching Study
Dai J, Ding Y, Zheng Q, Zhao G, Zou L, Zhao J, Luo Y, Chongsuvivatwong V
Cancer Management and Research 2025, 17:483-497
Published Date: 4 March 2025