Predicting Superaverage Length of Stay in COPD Patients with Hypercapnic Respiratory Failure Using Machine Learning

Bingqing Zuo; Lin Jin; Zhixiao Sun; Hang Hu; Yuan Yin; Shuanying Yang; Zhongxiang Liu

doi:10.2147/JIR.S511092

Back to Journals » Journal of Inflammation Research » Volume 18

Original Research

Predicting Superaverage Length of Stay in COPD Patients with Hypercapnic Respiratory Failure Using Machine Learning

Authors Zuo B, Jin L, Sun Z, Hu H, Yin Y, Yang S, Liu Z

Received 8 December 2024

Accepted for publication 19 April 2025

Published 8 May 2025 Volume 2025:18 Pages 5993—6008

DOI https://doi.org/10.2147/JIR.S511092

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Tara Strutt

Download Article [PDF]

Bingqing Zuo,^1,^* Lin Jin,^2,^* Zhixiao Sun,¹ Hang Hu,¹ Yuan Yin,³ Shuanying Yang,⁴ Zhongxiang Liu^1,⁴

¹Department of Pulmonary and Critical Care Medicine, The Yancheng Clinical College of Xuzhou Medical University, The First People’s Hospital of Yancheng, Yancheng, Jiangsu, 224006, People’s Republic of China; ²Third Department of Respiratory and Critical Care Medicine, First Affiliated Hospital of Kunming Medical University, Kuming, Yunnan, 650000, People’s Republic of China; ³Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital of Nanjing Medical University, Nanjing, Jiangsu, 210029, People’s Republic of China; ⁴Department of Respiratory and Critical Care Medicine, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shanxi, 710004, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Shuanying Yang, Email [email protected] Zhongxiang Liu, Email [email protected]

Objective: The purpose of this study was to develop and validate machine learning models that can predict superaverage length of stay in hypercapnic-type respiratory failure and to compare the performance of each model. Furthermore, screen and select the optimal individualized risk assessment model. This model is capable of predicting in advance whether an inpatient’s length of stay will exceed the average duration, thereby enhancing its clinical application and utility.
Methods: The study included 568 COPD patients with hypercapnic respiratory failure, 426 inpatients from the Department of Respiratory and Critical Care Medicine of Yancheng First People’s Hospital in the modeling group and 142 inpatients from the Department of Respiratory and Critical Care Medicine of Jiangsu Provincial People’s Hospital in the external validation group. Ten machine learning algorithms were used to develop and validate a model for predicting superaverage length of stay, and the best model was evaluated and selected.
Results: We screened 83 candidate variables using the Boruta algorithm and identified 9 potentially important variables, including: cerebrovascular disease, white blood cell count, hematocrit, D-dimer, activated partial thromboplastin time, fibrin degradation products, partial pressure of carbon dioxide, reduced hemoglobin, and oxyhemoglobin. Cerebrovascular disease, hematocrit, activated partial thromboplastin time, partial pressure of carbon dioxide, reduced hemoglobin and oxyhemoglobin were independent risk factors for superaverage length of stay in COPD patients with hypercapnic respiratory failure. The Catboost model is the optimal model on both the modeling dataset and the external validation set. The interactive web calculator was developed using the Shiny framework, leveraging a predictive model based on Catboost.
Conclusion: The Catboost model has the most advantages and can be used for clinical evaluation and patient monitoring.

Keywords: chronic obstructive pulmonary disease, COPD, hypercapnic respiratory failure, HRF, superaverage length of stay, machine learning, Catboost model

Introduction

Hypercapnic respiratory failure (HRF), usually defined as arterial partial pressure of carbon dioxide (PaCO2) ≥45 mmHg and often accompanied by a decrease in arterial partial pressure of oxygen (PaO2) can occur in a variety of etiologies, primarily in chronic respiratory diseases such as exacerbations of chronic obstructive pulmonary disease (COPD), cystic fibrosis, thoracic deformities, and other conditions such as neuromuscular disease.^1,2 The end-stage of COPD often leads to HRF, which is associated with debilitating symptoms and a low survival rate.³

However, the majority of patients with HRF require hospitalization, and many require ventilatory support in a dedicated intensive care unit, incurring considerable healthcare costs.^4,5 In the United States, acute respiratory failure (ARF) treatment costs more than $50 billion annually due to high mortality rates and long hospital stays, especially for older patients.^6–8

Especially as the global healthcare industry gradually and steadily shifts from fee-for-service to value-based care agreements, length of stay (LOS) is a useful indicator of resource utilization and cost-effectiveness, and has been likened to an indicator for reducing Medicare expenditures.^9,10 Therefore, the prolongation of hospital stay has an important impact on the resource consumption and revenue management of medical institutions, and should not be discarded.¹¹

However, in order to better clarify the reasons for prolonged hospital stay in order to improve long-term care strategies for these patients, at the same time, increased knowledge of prediction and prediction of prolonged length of stay may contribute to earlier and better patient treatment, optimal disease planning, and shorter length of stay, which will ultimately help in cost assessment for hospitals.¹² There are no studies investigating risk factors for superaverage length of stay in COPD patients with HRF, so it is important to look for a cause in patients with HRF to reduce the length of hospital stay and mitigate high medical costs.¹³

Machine learning (ML) algorithms can screen for risk factors that affect the length of hospital stay in patients with HRF by analyzing large amounts of data from electronic health records. These data may include patient comorbidities, demographics, and routine examination findings. These cutting-edge analytical methods can process high-dimensional data and analyze complex relationships. They are more flexible than traditional modeling techniques.^14,15

This study aims to be used to develop and validate an ML model that can predict the superaverage length of stay in COPD patients with HRF.

Materials and Methods

Data Source

Modeling dataset: We selected COPD patients combined with HRF who were hospitalized in the Department of Respiratory and Critical Care Medicine of Yancheng First People’s Hospital from October 2020 to September 2021, and the external validation dataset: COPD patients combined with HRF who were hospitalized in the Department of Respiratory and Critical Care Medicine of Jiangsu Provincial People’s Hospital from October 2021 to December 2021. The diagnosis of COPD was determined according to the Global Initiative for Chronic Obstructive Pulmonary Disease (GOLD) strategy.¹⁶ Diagnostic criteria for respiratory failure were as follows: arterial partial pressure of oxygen (PaO2) < 8.0 kPa (60 mmHg) and arterial partial pressure of carbon dioxide (PaCO2)>6.0 kPa (45 mmHg) as measured by blood gas analysis.¹⁷

Patients who had incomplete clinical data, were under the age of 60, experienced death during hospitalization or abandoned treatment, had hearing or speech impairment that hindered their ability to answer questions adequately, required tracheotomy or intubation, suffered from organ failure, or were unable to provide informed consent were excluded. Exclusions also applied to patients with trauma, malignancy (including hematological malignancy), and pregnancy.

Study Variables

The following clinical data were collected within 24 hours of admission: demographic information, clinical characteristics, comorbidities, various rating scales at the time of admission, laboratory test results (blood sample tests), etc. The length of stay in COPD patients combined with HRF was defined as the duration between admission and discharge.

The flowchart illustrating the experimental procedure of the multicenter post-study was depicted in Figure 1. Firstly, 426 COPD patients combined with HRF in Yancheng First People’s Hospital were used as a modeling dataset, and 142 COPD patients combined with HRF provided by Jiangsu Provincial People’s Hospital were used for external validation. The current project followed the principles of the Declaration of Helsinki. This work was approved by the First People’s Hospital of Yancheng City (No. 2020-K062) and the Ethics Committee of Jiangsu Provincial People’s Hospital (No. 2021-SR-346). In addition, participants at both hospitals provided informed written consent to support the clinical study.

Figure 1 The flowchart of this study.

Model Building and Validation Methods

The study data included both modeled datasets and external validation datasets. Based on the modeling dataset, the Boruta method was used to screen the variables, and then the Random Forest (RF), Categorical Boosting (CatBoost), Light Gradient Boosting Machine (LightGBM) and Extreme Gradient were used respectively Boosting (XGBoost), Gradient Boosting Machine (GBM), Neural Network (NNET), Support Vector Machine (SVM), K-Nearest Neighbor Algorithm (KNN), Naive Bayes Bayes and Logistic Regression, respectively, constructed models for predicting superaverage length of stay. Each model was trained to determine the optimal hyperparameters through grid search and 10-fold cross-validation.

After the model was established, the performance of 10 models was evaluated on the modeling dataset and the external validation dataset. The performance of these models was evaluated by several metrics, including accuracy, precision (positive predictive value), recall, F1-score, sensitivity, specificity, negative predictive value, and Brier score. Subsequently, the area under the receiver operating characteristic curve (AUC) was used as the main criterion to measure the performance of the models, and the discrimination ability of the 10 models was evaluated. In addition, calibration curves were used to verify the consistency of the prediction effect of each model with the actual situation. At the same time, the decision curve analysis (DCA) was used to calculate the clinical net benefit of each model under different clinical decision thresholds. Finally, to evaluate model performance more comprehensively, a precision-recall (PR) curve was used, a tool that was particularly useful for class-imbalance datasets, to measure the performance of each model in terms of positive class predictions.

SHAP plots were used to explain machine learning models, where the length of each variable on the horizontal axis represents its contribution to the outcome, and the color of the dots indicated the size or category of the variable’s features. Through SHAP analysis, the contribution value of the features in each sample to the prediction, ie, the Shapley value, can be obtained. Based on the Shapley value, the SHAP summary graph and the single-sample SHAP waterfall chart were constructed. The SHAP summary chart ranked the risk factors in importance, while the single-sample SHAP waterfall chart explains the predictions of a single sample.

Statistical Methods

All variables were included in the comparison of the modeled dataset and the external validation dataset. In the comparison of the differences between the two groups, the chi-square test was used for categorical variables, and the results were displayed as frequency and percentage. Normally distributed continuous variables were expressed using the t-test, and the results were expressed as mean ± standard deviations; Continuous variables that do not conform to the normal distribution were performed using the Mann–Whitney U-test, and the results were expressed in median and interquartile ranges. All statistical analysis and machine learning algorithms were performed in R language (version 4.1.3), and a two-sided p-value of less than 0.05 was considered statistically significant.

Results

Clinical Features of COPD Patients Combined with Hypercapnic Respiratory Failure

A total of 568 COPD patients combined with HRF were included in the study, with 426 in the modelling dataset and 142 in the external validation set. The study defined a hospital stay exceeding 10 days as superaverage length of stay, based on findings indicating that the average length of stay for hypercapnia respiratory failure was approximately 10 days.^18,19 The findings of our study revealed that, on the whole, 60.56% of patients had a hospitalization duration of less than 10 days, while 39.44% required a hospital stay exceeding 10 days. There was no significant difference in the number of days of hospital stay between the two groups (p = 0.766). The median age of the patients was 74 years, and 66.55% were male, and there were no significant differences between the two groups in terms of gender, age, BMI, fall score, pressure ulcer score, self-care ability score, pain score, and VTE score (P > 0.05). There were significant differences between the two groups in terms of smoking history, cardiovascular disease, MMRC score, pneumonia, and white blood cell count (p < 0.05). In general, the distribution of most variables between the two groups was similar and comparable, as detailed in Table 1.

Table 1 Clinical Characteristics of COPD Patients Combined with Hypercapnic Respiratory Failure

Variable Screening and Identification of Independent Influencing Factors

In this study, we used the Boruta algorithm to screen 83 candidate variables with the aim of identifying key indicators that are closely related to the dependent variable. The results showed that nine potentially important variables were screened from the modeled dataset, including: cerebrovascular disease, white blood cell count, hematocrit, D-dimer, activated partial thromboplastin time, fibrin degradation products, carbon dioxide partial pressure, reduced hemoglobin and oxyhemoglobin, as shown in Figure 2.

Figure 2 The process of filtering variables using the Boruta method.

In order to conduct an initial exploration of the factors influencing prolonged hospitalization in COPD patients with HRF, we employed multivariate logistic regression analysis based on the 9 indicators selected by the Boruta algorithm mentioned above. The findings revealed that cerebrovascular diseases, erythrocyte specific volume, activated partial thromboplastin time, partial pressure of carbon dioxide, reduced hemoglobin and oxygenated hemoglobin were identified as independent factors significantly impacting the excessively long duration of hospital stay among COPD patients combined with HRF, as presented in Table 2.

Table 2 Multivariate Logistic Regression Results for Predicting the Risk of Prolonged Hospital Stay in COPD Patients with Hypercapnia Respiratory Failure

Establishment and Evaluation of 10 Machine Learning Models

In this study, we used 10 algorithms, including RF, Catboost, LightGBM, XGBoost, GBM, NNET, SVM, KNN, NaiveBeyes, and Logistic regression to train the model, and used 8 evaluation indicators to compare the evaluation effect of each model on the modeling dataset and the external validation set (Table 3). The results indicated that the Catboost model demonstrated superior performance in both the modeling dataset and the external validation set.

Table 3 Evaluation Results of 10 Models on Modeling Datasets and External Validation Sets

Then, in this study, we compared the performance of 10 different prediction models using the area under the receiver operating characteristic curve (AUC) as an evaluation metric. The Catboost model exhibited the highest AUC values on both the modeled data set and the external validation set, as illustrated in Figure 3. The analysis of calibration curves further confirmed the superiority of the Catboost model, particularly in the external validation set. The calibration curves of the Catboost model demonstrated a significantly higher level of prediction accuracy compared to other models, as clearly depicted in Figure 4. Decision curve analysis (DCA) demonstrated that each model exhibited excellent clinical applicability in both the modeling data set and external validation set, with the specific DCA results presented in Figure 5.

Figure 3 The ROC curve was used to evaluate the 10 models in modeling data set (A) and external validation set (B).

Figure 4 The Calibration curve was used to evaluate the 10 models in modeling data set (A) and external validation set (B).

Figure 5 The DCA curve was used to evaluate the 10 models in modeling data set (A) and external validation set (B).

The Catboost model also showed strong performance in terms of positive predictions in the evaluation of precision rate-recall curve (PR curve), as depicted in Figure 6. Taken together, the Catboost model was considered to be the best model due to its performance on several key performance indicators. To further demonstrate this, we also provided the Catboost model’s confusion matrix on the modeling dataset and the external validation set, as shown in Figure 7. This matrix provided a comprehensive breakdown of the model’s accurate predictions and misclassifications across different categories, thereby offering an insightful assessment of its predictive capabilities.

Figure 6 The PR curve was used to evaluate the 10 models in modeling data set (A) and external validation set (B).

Figure 7 The Confusion matrix was used to evaluate the Catboost model in modeling data set (A) and external validation set (B).

SHAP Interpretation of the Catboost Model

In order to better understand the influence of the Catboost model on the variable in the analysis of the factor of superaverage length of stay, we used SHAP (SHapley Additive exPlanations) values to explain the prediction results of the model.

First, we generated a SHAP summary plot that summarizes the impact of all variables on the model’s predictions. As can be observed from Figure 8, the effects of variables on superaverage length of stay were oxyhemoglobin (O2Hb), activated partial thromboplastin time (APTT), D-dimer, reduced hemoglobin (deoxy-Hb), fibrin degradation products (FDP), white blood cell count (WBC), hematocrit (HCT), cerebrovascular disease (CeVD), and partial pressure of carbon dioxide (PCO2). These results pointed to the level of oxyhemoglobin as the most important factor in predicting superaverage length of stay, followed by coagulation system-related indicators such as APTT and D-dimer.

Figure 8 The SHAP summary diagram of Catboost model.

Further, we used SHAP waterfall plots to explain in detail the predictions of the model at the individual patient level. The waterfall plot details the risk probability of 0.259 for the first patient (Figure 9), which meaned that the patient has a lower risk of superaverage length of stay. The figure showed that a lower APTT is a factor that increases the probability of superaverage length of stay in this patient. On the contrary, D-dimer, O2Hb, and deoxy-Hb were the protective factors that reduced the probability of superaverage length of stay in this patient, indicating that the higher values of these indicators may help mitigate the risk of prolonged hospitalization.

Figure 9 The SHAP waterfall diagram of the first patient in the Catboost model.

Webpage Calculator for Catboost Model

In order to provide a practical tool to predict the risk of a patient’s superaverage length of stay, we builted an interactive web-based calculator using the Shiny framework based on Catboost’s prediction model. In the Shiny app, users can enter the patient’s medical parameters such as oxyhemoglobin levels, D-Dimer values, etc., which are key variables in the model’s decision-making process through a simple and intuitive interface. The application reads these inputs in real-time and uses a pre-trained Catboost model to calculate the patient’s superaverage length of stay and displays the risk probability on a web page at: https://prolonged.shinyapps.io/Catboost-model/.

Discussion

As far as we know, this is the first study based on ML techniques to generate multiple models, evaluate performance, and select the highest-performing model to predict the superaverage length of stay in COPD patients combined with HRF. This study showed that the Catboost model exhibited superior performance and clinical utility compared to other independent machine learning models, including RF, LightGBM, XGBoost, GBM, NNET, SVM, KNN, NaiveBeyes, and Logistic regression models.

CatBoost is a novel machine learning algorithm that has the advantage of automatically processing categorical features and missing values during training, not during preprocessing.^20,21 As a result, in some previous studies, CatBoost has significantly outperformed other machine learning models in various data analysis.²²

In this study, the CatBoost model was a machine learning technique that could prevent overfitting by using unbiased gradient estimation.²³ The CatBoost algorithm was chosen because our dataset contains many categorical variables (eg, gender, smoking status, comorbidities, etc.) and ensured generalization of the model by minimizing overfitting.²³

In addition, machine learning-based AI methods, such as CatBoost, tend to favor black-box models, which offer significant advantages over traditional models in obtaining accurate predictions, but cannot be reasonably explained.²⁴ Therefore, the SHAP method could be used to calculate the SHAP value of each feature in the prediction model to provide consistent and accurate attribution for each feature, and the interpretable analysis of the machine learning model was carried out.^25,26 The SHAP analysis of a single sample enables the identification of high-risk samples and patients, thereby enhancing physicians’ comprehension of the decision-making process employed by the CatBoost model and facilitating the utilization of predictive outcomes.

According to the results of the study, we found that cerebrovascular disease,White blood cell count, hematocrit, activated partial thromboplastin time, partial pressure of carbon dioxide, reduced hemoglobin and oxyhemoglobin were risk factors for the superaverage length of stay in COPD patients with HRF. Some results have been shown to affect the length of stay in other types of respiratory failure^27–30 or in patients with COPD,^31–33 but reduced hemoglobin and oxyhemoglobin have not been studied.

Based on the CatBoost algorithm, we found that oxyhemoglobin, APTT, and D-dimer were the top 3 strongest predictors of superaverage length of stay in HRF patients. There are no studies on the effect of these measures on the superaverage length of stay in COPD patients with HRF, but in one study it was found that patients with COPD often resulted in prolonged hospital stay and poor prognosis. The main reason is that high concentrations of blood carbon dioxide and acidosis can lead to malfunction of coagulation factors³⁴ and damage to blood endothelial cells.³⁵ Specifically, APTT may be significantly prolonged^36,37 and D-dimer levels may increase.³⁸ Prolongation of APTT may be caused by the depletion of coagulation factors,^39,40 while D-dimer is an indicator of thrombosis and is suggested as a prognostic biomarker of mortality in AECOPD.⁴¹ These results suggest a correlation between coagulopathy and HRF, and that this relationship may lead to a longer hospital stay.^42,43 This is consistent with our findings.

Severe impairment of respiratory function is associated with oxyhemoglobin affinity (P50).⁴⁴ In particular, when hypoxia or hypercapnia is present, P50 has some effect.⁴⁵ Hypercapnia has been shown to inhibit arterial oxygen-carrying capacity.⁴⁶ Moreover, some studies have substantiated the correlation between oxyhemoglobin concentration and the long-term mortality risk associated with COPD.^47,48 Perhaps this is the reason for the extraordinarily long hospital stay in HRF patients. These findings can identify the risk of long hospital stay in HRF patients at an early stage, and enable targeted clinical care through timely intervention.

Of course, our study is also subject to some limitations: first, the patient cohort size is relatively small, although some interesting results have been found. However, the model still needs to be further validated in a large number of cohorts from multiple clinical centers. Second, future studies will be able to consider incorporating a wider range of clinical features to provide better outcomes. For example, other clinical information: patient medical images, ECG signals, lung function, etc.

In summary, a model for predicting the superaverage length of stay in COPD patients with HRF was developed and validated using clinical variables using 10 machine learnings. Among them, the Catboost model has the most advantages and can be used for clinical evaluation and patient monitoring. This model could predict the average length of hospital stay for HRF patients, assist clinicians in selecting appropriate treatment plans, and prevent the unnecessary expenditure of clinical resources.

Data Sharing Statement

All data generated or analysed during this study are included in this published article.

Ethics Approval and Consent to Participate

This study was approved by the ethics committees of the First People’s Hospital of Yancheng (No.2020-K062) and the People’s Hospital of Jiangsu Province (No. 2021-SR-346).

Consent for Publication

All authors approved the final manuscript and the submission to this journal.

Acknowledgments

Bingqing Zuo and Lin Jin are co-first authors for this work.

Author Contributions

All authors made a significant contribution to the work reported, whether that is in the conception, study design, execution, acquisition of data, analysis and interpretation. During the submission process, all authors were actively involved in drafting, revising, and providing final approval of the publication version of the article. All authors have unanimously agreed on the selection of the journal to which the article was submitted and have consented to assume responsibility for the content of the article.

Funding

This study was supported by the Yancheng key research and development plan guiding project (YCBE202344).

Disclosure

The authors declare that they have no competing interests in this work.

References

1. Comellini V, Pacilli AMG, Nava S. Benefits of non-invasive ventilation in acute hypercapnic respiratory failure. Respirology. 2019;24:308–317. doi:10.1111/resp.13469

2. Roussos C, Koutsoukou A. Respiratory failure. Europ Resp J. 2003;22(47 suppl):3s–14s. doi:10.1183/09031936.03.00038503

3. van der Leest S, Duiverman ML. High-intensity non-invasive ventilation in stable hypercapnic COPD: evidence of efficacy and practical advice. Respirology. 2019;24:318–328. doi:10.1111/resp.13450

4. Chung Y, Garden FL, Marks GB, Vedam H. Causes of hypercapnic respiratory failure and associated in-hospital mortality. Respirology. 2023;28:176–182. doi:10.1111/resp.14388

5. Luthi-Corridori G, Boesing M, Ottensarendt N, et al. Predictors of length of stay, mortality and rehospitalization in COPD patients: a Retrospective Cohort Study. J Clin Med. 2023;12. doi:10.3390/jcm12165322

6. Stefan MS, Shieh M-S, Pekow PS, et al. Epidemiology and outcomes of acute respiratory failure in the United States, 2001 to 2009: a national survey. J Hospital Med. 2013;8:76–82. doi:10.1002/jhm.2004

7. Chen WL, Chen C-M, Kung S-C, et al. The outcomes and prognostic factors of acute respiratory failure in the patients 90 years old and over. Oncotarget. 2018;9:7197–7203. doi:10.18632/oncotarget.24051

8. Hope AA, Adeoye O, Chuang EH, et al. Pre-hospital frailty and hospital outcomes in adults with acute respiratory failure requiring mechanical ventilation. J Crit Care. 2018;44:212–216. doi:10.1016/j.jcrc.2017.11.017

9. Roberts RR, Frutos PW, Ciavarella GG, et al. Distribution of variable vs fixed costs of hospital care. JAMA. 1999;281:644–649. doi:10.1001/jama.281.7.644

10. Faddy M, Graves N, Pettitt A. Modeling length of stay in hospital and other right skewed data: comparison of phase-type, gamma and log-normal distributions. Value Health. 2009;12:309–314. doi:10.1111/j.1524-4733.2008.00421.x

11. Dehouche N, Viravan S, Santawat U, et al. Hospital length of stay: a cross-specialty analysis and Beta-geometric model. PLoS One. 2023;18:e0288239. doi:10.1371/journal.pone.0288239

12. Naurzvai N, Mammadova A, Gursel G. Risk factors for prolonged intensive care unit stay in patients with hypercapnic respiratory failure. Rev Recent Clinical Trials. 2023;18:129–139. doi:10.2174/1574887118666230320163229

13. Tyler K, Stevenson D. Respiratory emergencies in geriatric patients. Emerg Med Clin North Am. 2016;34:39–49. doi:10.1016/j.emc.2015.08.003

14. Hou N, Li M, He L, et al. Predicting 30-days mortality for MIMIC-III patients with sepsis-3: a machine learning approach using XGboost. J Transl Med. 2020;18:462. doi:10.1186/s12967-020-02620-5

15. Du M, Haag DG, Lynch JW, Mittinty MN. Comparison of the tree-based machine learning algorithms to cox regression in predicting the survival of oral and pharyngeal cancers: analyses based on SEER database. Cancers. 2020;12:2802. doi:10.3390/cancers12102802

16. Vogelmeier CF, Criner GJ, Martínez FJ, et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease 2017 report: GOLD executive summary. Arch Bronconeumol. 2017;53:128–149. doi:10.1016/j.arbres.2017.02.001

17. Zakynthinos S, Roussos C. Hypercapnic respiratory failure. Respir Med. 1993;87:409–411. doi:10.1016/0954-6111(93)90065-8

18. Kempker JA, Abril MK, Chen Y, et al. The epidemiology of respiratory failure in the United States 2002–2017: a serial cross-sectional study. Crit Care Explor. 2020;2:e0128. doi:10.1097/CCE.0000000000000128

19. Kwizera A, Nakibuuka J, Nakiyingi L, et al. Acute hypoxaemic respiratory failure in a low-income country: a prospective observational study of hospital prevalence and mortality. BMJ Open Respir Res. 2020;7. doi:10.1136/bmjresp-2020-000719

20. Hancock JT, Khoshgoftaar TM. CatBoost for big data: an interdisciplinary review. J Big Data. 2020;7(94). doi:10.1186/s40537-020-00369-8

21. Lyu Y, Wu H-M, Yan H-X, et al. Classification of coronary artery disease using radial artery pulse wave analysis via machine learning. BMC Med Inf Decis Making. 2024;24:256. doi:10.1186/s12911-024-02666-1

22. Zhao QY, Liu L-P, Luo J-C, et al. A machine-learning approach for dynamic prediction of sepsis-induced coagulopathy in critically ill patients with sepsis. Front Med. 2020;7:637434. doi:10.3389/fmed.2020.637434

23. Kong SH, Ahn D, Kim B, et al. A novel fracture prediction model using machine learning in a community-based cohort. JBMR Plus. 2020;4:e10337. doi:10.1002/jbm4.10337

24. Yang Y, Yuan Y, Han Z, Liu G. Interpretability analysis for thermal sensation machine learning models: an exploration based on the SHAP approach. Indoor Air. 2022;32:e12984. doi:10.1111/ina.12984

25. Luo H, Xiang C, Zeng L, et al. SHAP based predictive modeling for 1 year all-cause readmission risk in elderly heart failure patients: feature selection and model interpretation. Sci Rep. 2024;14:17728. doi:10.1038/s41598-024-67844-7

26. Scavuzzo CM, Scavuzzo JM, Campero MN, et al. Feature importance: opening a soil-transmitted helminth machine learning model via SHAP. Infect Dis Model. 2022;7:262–276. doi:10.1016/j.idm.2022.01.004

27. Hadaya J, Verma A, Sanaiha Y, et al. Machine learning-based modeling of acute respiratory failure following emergency general surgery operations. PLoS One. 2022;17:e0267733. doi:10.1371/journal.pone.0267733

28. Fimognari FL, De Vincentis A, Arone A, et al. Clinical outcomes and phenotypes of respiratory failure in older subjects admitted to an acute care geriatric hospital ward. Int Emerg Med. 2024;19:1359–1367. doi:10.1007/s11739-024-03625-4

29. Nowiński A, Kamiński D, Korzybski D, Stokłosa A, Górecka D. The impact of comorbidities on the length of hospital treatment in patients with chronic obstructive pulmonary disease. Pneumonologia i alergologia polska. 2011;79:388–396.

30. Madani Y, Saigal A, Sunny J, Morris L, Johns RH. Characterization of chronic obstructive pulmonary disease patients with a long length of stay: a retrospective observational cohort study. Turk Thorac J. 2017;18:119–124. doi:10.5152/TurkThoracJ.2017.17026

31. Wang Y, Stavem K, Dahl FA, Humerfelt S, Haugen T. Factors associated with a prolonged length of stay after acute exacerbation of chronic obstructive pulmonary disease (AECOPD). Int J Chron Obstruct Pulmon Dis. 2014;9:99–105. doi:10.2147/COPD.S51467

32. Hu G, Wu Y, Zhou Y, et al. Prognostic role of D-dimer for in-hospital and 1-year mortality in exacerbations of COPD. Int J Chron Obstruct Pulmon Dis. 2016;11:2729–2736. doi:10.2147/COPD.S112882

33. Ramaraju K, Kaza AM, Balasubramanian N, Chandrasekaran S. Predicting healthcare utilization by patients admitted for COPD exacerbation. J Clin Diagn Res. 2016;10:OC13–17. doi:10.7860/JCDR/2016/17721.7216

34. White H, Bird R, Sosnowski K, Jones M. An in vitro analysis of the effect of acidosis on coagulation in chronic disease states - a thromboelastograph study. Clin Med. 2016;16:230–234. doi:10.7861/clinmedicine.16-3-230

35. Malerba M, Nardin M, Radaeli A, et al. The potential role of endothelial dysfunction and platelet activation in the development of thrombotic risk in COPD patients. Exp Rev Hematol. 2017;10:821–832. doi:10.1080/17474086.2017.1353416

36. Husebø GR, Gabazza EC, D’alessandro gabazza C, et al. Coagulation markers as predictors for clinical events in COPD. Respirology. 2021;26:342–351. doi:10.1111/resp.13971

37. Ashoor TM, Hasseb AM, Esmat IM. Nebulized heparin and salbutamol versus salbutamol alone in acute exacerbations of chronic obstructive pulmonary disease requiring mechanical ventilation: a double-blind randomized controlled trial. Korean J Anesthesiol. 2020;73:509–517. doi:10.4097/kja.19418

38. Fruchter O, Yigla M, Kramer MR. D-dimer as a prognostic biomarker for mortality in chronic obstructive pulmonary disease exacerbation. Am J Med Sci. 2015;349:29–35. doi:10.1097/maj.0000000000000332

39. Tang N, Li D, Wang X, Sun Z. Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia. J Thromb Haemost. 2020;18:844–847. doi:10.1111/jth.14768

40. Zhu J, Pang J, Ji P, et al. Coagulation dysfunction is associated with severity of COVID-19: a meta-analysis. J med virol. 2021;93:962–972. doi:10.1002/jmv.26336

41. Pang H, Wang L, Liu J, et al. The prevalence and risk factors of venous thromboembolism in hospitalized patients with acute exacerbation of chronic obstructive pulmonary disease. Clin Respir J. 2018;12:2573–2580. doi:10.1111/crj.12959

42. Liu M, Hu R, Jiang X, Mei X. Coagulation dysfunction in patients with AECOPD and its relation to infection and hypercapnia. J Clin Lab Analysis. 2021;35:e23733. doi:10.1002/jcla.23733

43. Zheng LL, Wang S, Li Z-G, et al. Correlation of coagulation dysfunction with infection and hypercapnia in acute exacerbation of COPD patients. Infect Drug Resist. 2023;16:5387–5394. doi:10.2147/idr.s421925

44. Bergamaschi G, Barteselli C, Del Rio V, et al. Impaired respiratory function reduces haemoglobin oxygen affinity in COVID-19. Br J Haematol. 2023;200:e44–e47. doi:10.1111/bjh.18620

45. Neukirch F, Verdier F, Chastang C, et al. Changes of P50 in hypoxaemia, hypercapnia and polycythaemia: multivariate analysis. Biomedicine. 1977;26:409–415.

46. Torbati D, Totapally BR, Camacho MT, Wolfsdorf J. Experimental critical care in ventilated rats: effect of hypercapnia on arterial oxygen-carrying capacity. J Crit Care. 1999;14:191–197. doi:10.1016/s0883-9441(99)90034-5

47. Trudzinski FC, Jörres RA, Alter P, et al. Associations of oxygenated hemoglobin with disease burden and prognosis in stable COPD: results from COSYCONET. Sci Rep. 2020;10:10544. doi:10.1038/s41598-020-67197-x

48. Hinke CF, Jörres RA, Alter P, et al. Prognostic value of oxygenated hemoglobin assessed during acute exacerbations of chronic pulmonary disease. Respiration. 2021;100:387–394. doi:10.1159/000513440

Creative Commons License © 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, 4.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]

Predicting Superaverage Length of Stay in COPD Patients with Hypercapnic Respiratory Failure Using Machine Learning

Introduction

Materials and Methods

Data Source

Study Variables

Model Building and Validation Methods

Statistical Methods

Results

Clinical Features of COPD Patients Combined with Hypercapnic Respiratory Failure

Variable Screening and Identification of Independent Influencing Factors

Establishment and Evaluation of 10 Machine Learning Models

SHAP Interpretation of the Catboost Model

Webpage Calculator for Catboost Model

Discussion

Data Sharing Statement

Ethics Approval and Consent to Participate

Consent for Publication

Acknowledgments

Author Contributions

Funding

Disclosure

References

Recommended articles

Identifying Common Diagnostic Biomarkers and Therapeutic Targets between COPD and Sepsis: A Bioinformatics and Machine Learning Approach