Development and Validation of a Predictive Model for Anxiety Trajectories in Patients with Breast Cancer: A Retrospective Study

Xia Li; Ben-Kai Wei; Fan Li; Huan-Huan Yan; Jun Shen

doi:10.2147/PRBM.S501127

Back to Journals » Psychology Research and Behavior Management » Volume 18

Original Research

Development and Validation of a Predictive Model for Anxiety Trajectories in Patients with Breast Cancer: A Retrospective Study

Authors Li X, Wei BK, Li F, Yan HH, Shen J

Received 15 October 2024

Accepted for publication 4 February 2025

Published 15 February 2025 Volume 2025:18 Pages 315—329

DOI https://doi.org/10.2147/PRBM.S501127

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Igor Elman

Download Article [PDF]

Xia Li,^1,^* Ben-Kai Wei,^2,^* Fan Li,¹ Huan-Huan Yan,¹ Jun Shen¹

¹Department of Breast Surgery, the First People’s Hospital of Lianyungang, The Affiliated Hospital of Xuzhou Medical University, Lianyungang, 222000, Jiangsu Province, People’s Republic of China; ²Department of General Surgery, the First People’s Hospital of Lianyungang, The Affiliated Hospital of Xuzhou Medical University, Lianyungang, 222000, Jiangsu Province, People’s Republic of China

*These authors contributed equally to this work

Correspondence: Jun Shen, Department of Breast Surgery, the First People’s Hospital of Lianyungang, The Affiliated Hospital of XuZhou Medical University, Lianyungang, 222002, Jiangsu Province, People’s Republic of China, Tel +8618961325323, Email [email protected]

Objective: This study aims to develop and validate a predictive model for short-term post-treatment anxiety trajectories in patients with breast cancer, utilizing baseline patient characteristics and initial anxiety scores to inform precise clinical interventions.
Methods: Baseline characteristics were collected from 424 patients diagnosed with breast cancer who underwent surgical treatment at our hospital between January 1, 2021, and December 30, 2022. Anxiety levels were assessed using the Self-Rating Anxiety Scale (SAS) scores at admission and at 3-, 6-, 9-, and 12-months post-treatment. Distinct trajectories of SAS score changes were identified and categorized. Variables were screened, and multiple models were developed. The optimal model was identified through comparative analysis, and a nomogram was generated following model simplification.
Results: We found three distinct trends in the trajectory of anxiety, but we grouped them into two broad categories: gradual reduction of anxiety and persistent anxiety. LM Model was established by logistic regression, and Model 1 and Model 2 were established by Random Forest (RF) and eXtreme Gradient Boosting (Xgboost) screening variables. The ROC curve areas in the validation set were 0.822 (0.757– 0.887), 0.757 (0.680– 0.834) and 0.781 (0.710– 0.851), respectively. Model comparison, using Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI), identified the Lm model as optimal, which underwent further simplification and value assignment. Decision Curve Analysis (DCA) and Clinical Impact Curve (CIC) analyses confirmed the superiority of model-based interventions over general interventions.
Conclusion: Distinct anxiety trajectories are observed in patients diagnosed with breast cancer during the first 12 months post-treatment. Predictive modeling based on baseline characteristics is feasible although though further research is warranted.

Keywords: anxiety, breast cancer, prediction model, self-rating anxiety scale score, trajectory analysis

Introduction

The 2022 global cancer statistics report that female breast cancer is among the most prevalent malignant tumors, ranking second in global cancer incidence and first among cancers affecting women. In 2022, there were 2,308,897 new cases, accounting for 11.6% of all cancer cases, and 665,684 deaths, representing 6.9% of cancer-related mortality, making it a leading cause of cancer-related deaths in women. ^1–3 Individuals diagnosed with breast cancer face significant challenges due to the prolonged nature of treatment, including fatigue, reduced physical function, and treatment-related side effects. Additionally, concerns about disease recurrence, loss of familial roles, and financial burdens contribute to varying levels of psychological distress. ^4,5

Approximately 41.9% of patients with breast cancer experience anxiety, which is linked to increased recurrence rates and overall mortality. ⁶ Women with anxiety tend to exhibit poorer prognoses, with anxiety serving as an independent predictor of breast cancer recurrence and survival outcomes. ⁷ As the biopsychosocial model of healthcare continues to gain prominence, the impact of anxiety on individuals with breast cancer has become a key area of focus for healthcare professionals. ^8,9 Previous studies have shown that breast cancer anxiety mainly stems from the expectation of negative results and the uncertainty of the future, the worry of recurrence, and the worry of treatment side effects during and after treatment. Meanwhile, the situation and causes of anxiety in breast cancer patients of different ages are analyzed. ¹⁰ The perception of the disease is also an important factor in the psychological anxiety of breast cancer patients. ¹¹

Numerous interventions have been shown to effectively alleviate anxiety in this population, with considerable success.^12–14 However, most previous studies on anxiety are cross-sectional research, with limited investigations on the trends of anxiety over time. Addressing anxiety based on a single postoperative time point fails to comprehensively cover all patients at risk and does not allow for proactive intervention. In addition, the majority of studies have concentrated on anxiety during the treatment phase,¹⁵ with fewer addressing anxiety during the follow-up period. Some research indicates that a subset of patients may be at risk for long-term anxiety.¹⁶ Therefore, identifying individuals at risk for chronic anxiety, analyzing the trajectory of anxiety over time, and guiding appropriate clinical interventions are critical areas for further investigation. Therefore, we conducted a retrospective analysis of the anxiety trends among breast cancer patients treated at our center. Our analysis revealed that anxiety changes follow a predictable pattern. This insight provides evidence to support the early prediction and targeted intervention for anxiety in clinical practice, helping to avoid both the waste of medical resources and the risk of over-intervention.

Materials and Methods

Research Subjects

This study included 424 individuals diagnosed with breast cancer who visited the hospital and underwent surgical treatment between January 1, 2021, and December 30, 2022. The inclusion criteria were as follows: ① patients who underwent surgery with a pathologically confirmed diagnosis of breast cancer; ② no prior treatment received before surgery; ③ age over 18 years; ④ Karnofsky Performance Status (KPS) Score greater than 70; ⑤ cognitive ability sufficient to understand the study; ⑥ informed consent obtained; ⑦ complete medical records and follow-up data available; ⑧ no evidence of disease progression events during the follow-up period; and ⑨ no major stressful events occurring during follow-up. Exclusion criteria were: ① individuals with metastatic tumors or an expected survival time of less than 6 months; ② comorbidities with other tumors or a history of previous tumors;③ severe comorbid heart, liver, or kidney diseases; ④ women who were pregnant or breastfeeding;⑤ Patients who do not follow up on time; ⑥ Patients who did not complete the scale carefully during follow-up.

Observation Indicators

Baseline characteristics and follow-up data were extracted from the hospital’s medical record data system. The baseline characteristics collected included hospitalization number (ID), age (in years), income, surgical procedures, Tumor-Node-Metastasis (TNM) stage, marital status, family support, education level, religious belief, tumor type, type of medical insurance, and baseline Self-Rating Anxiety Scale (SAS) score. Income was categorized as high (greater than 6000 Yuan/month), medium (4000 to 6000 Yuan/month), and low (less than 4000 Yuan/month). Surgical procedures included modified radical mastectomy (MRM) and breast-conserving surgery (BCS) for breast cancer. TNM staging was conducted based on the National Comprehensive Cancer Network (NCCN) guidelines. Marital status was classified as married or without a spouse (including those divorced or widowed prior to treatment). Family support was assessed through patient self-report and categorized as good, average, or poor. Education level was categorized into three categories: high (high school education or above), medium (middle school to high school education), and low (middle school education or below). Religious beliefs were recorded as either present or absent. Tumor types included Luminal A (Hormone receptor [HR] positive, Human epidermal growth factor receptor 2 [HER-2] Negative, Ki-67 < 20%), Luminal A (HR Positive, HER-2 Negative/Positive, Ki-67 ≥ 20%), triple-negative breast cancer (TNBC) (HR negative, HER-2 negative), HER2 positive (HR negative, HER2 positive), and medical insurance types included employee medical insurance, medical insurance of urban and rural residents, and patients without medical insurance. Follow-up assessments were conducted via telephone at the start of the follow-up period post-treatment (surgery, chemotherapy, radiotherapy) and subsequently at 3, 6, 9, and 12 months. Each follow-up was scheduled within a 7-day window of the designated time point. The SAS was administered at each follow-up to assess anxiety levels. SAS scale scoring method was filled in through the Questionnaire Star (https://www.wjx.cn/) APP. The results were downloaded and saved as Excel files, which were stored for future reference for a long time, and patients filled in the filling process independently. When patients were unable to use mobile APP proficiently, medical staff would explain them. However, patients were not involved in filling in the questionnaire, and the SAS score data was matched with the patient’s medical record data through the patient’s name.

Statistical Methods

Statistical analyses and visualizations were conducted using R version 4.0.2 software. Comparisons of clinical characteristics between groups were conducted using the Student’s t-test for continuous variables and the chi-square or Fisher’s exact test for categorical variables, as appropriate. Trajectory analysis of longitudinal data was carried out using the gbmt package (Version: 0.1.3). Correlation analysis and visualization of baseline data were conducted with the corrplot package (Version: 0.92). Logistic regression analysis was conducted using the rms package (Version: 6.5–0). Receiver operating characteristic (ROC) curves were generated and analyzed using the pROC package (Version: 1.18.0). Random forest modeling was performed using the randomForest package (Version: 4.7–1.1), with variable importance factors used for screening. Gradient boosting was conducted using the eXtreme Gradient Boosting (Xgboost) package (Version: 1.7.5.1) and the SHAPforxgboost package (Version: 0.1.3) for global and local interpretation of the model using SHAP (SHapley Additive exPlanation). Calibration curves were plotted using the val.prob function from the rms package. The Net Reclassification Improvement (NRI) was calculated and visualized using the nricens package (Version: 1.6), while the Integrated Discrimination Improvement (IDI) was calculated using the PredictABEL package (Version: 1.2–4). A nomogram was generated using the nomogram function from the rms package. Decision Curve Analysis (DCA) and Clinical Impact Curve (CIC) were plotted using the rmda package (Version: 1.6). The sample size of the study was measured using the pmsampsize package (version 1.1.3) p< 0.05 was considered statistically significant.

Results

Trajectory Analysis of the Longitudinal Data

A study flowchart is shown in Supplementary Figure 1. A total of 424 patients were included in this study. The SAS scores of these patients were assessed at five discrete time points: upon initiation of the follow-up period post-treatment (surgery, chemotherapy, radiotherapy), and at 3, 6, 9, and 12 months thereafter. Trajectory analysis was conducted to calculate key indicators, including the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Consistent Akaike Information Criterion (CAIC), Sample Size Adjusted BIC (SSBIC), and Hannan-Quinn Information Criterion (HQIC) under different predefined trajectory scenarios.

The results indicated that the predefined three-trajectory model demonstrated optimal performance across all criteria: AIC: 11815.4149, BIC: 11900.3025, CAIC: 11915.3025, SSBIC: 11852.6458, HQIC: 11846.4920 (see Table 1). This suggests that the predefined tripartite classification of trajectory trends may best represent the current optimal trajectory. Further analysis of trajectory patterns under different classification scenarios revealed three predominant characteristics: the first was an initial short-term decline in SAS scores followed by an increase, the second involved a gradual increase in SAS scores post-treatment, and the third reflected a gradual decline in SAS scores over time, followed by stabilization.

Table 1 Table of Parameters of Different Trajectory Grouping Models

An analysis of predefined two-trajectory models highlighted two main scenarios: one where SAS scores gradually decrease and then stabilize, and another where SAS scores gradually increase (see Figure 1A–D). Although the predefined three-trajectory model was supported by the key indicators and provided a better fit for the longitudinal data, it is important to consider that the population experiencing an initial short-term decline followed by an increase in SAS scores also requires attention and intervention. In both two- and three-trajectory models, the group with gradually decreasing SAS scores exhibited relative stability. Therefore, the dual-trajectory model was selected for subsequent investigations.

Figure 1 (A–D) illustrate the trajectory analysis of SAS scores across different time points, with varying predefined trends. (A) depicts a trajectory following a single predefined trend, while (B) reveals two, (C) presents three, and (D) demonstrates four predefined trends. (E) displays the correlation analysis among variables, and (F) provides the variance inflation factor (VIF) values for each variable. (G and H) represent the ROC curves for the Lm model in both the training and validation sets, with AUC of 0.837 (0.790–0.884) and 0.822 (0.757–0.887), respectively, indicating strong predictive performance.(I and J) present the calibration curves for the Lm model in the training and validation sets, respectively, demonstrating a Dxy of 0.674, R² of 0.417, a Brier score of 0.161, and a C-index of 0.837 in the training set. In the validation set, the Dxy was 0.644, R² was 0.360, Brier score was 0.166, and the C-index was 0.822, reflecting the robust calibration and stability of the model.

Abbreviation: SAS, Self-Rating Anxiety Scale; VIF, variance inflation factor; ROC, receiver operating characteristic curve; AUC: area under the ROC curve.

Basic Data and Grouping of the Study Group

This study included a total of 424 individuals diagnosed with breast cancer who underwent surgical treatment at the hospital between January 1, 2021, and December 30, 2022. The patients were randomly assigned to either the training or validation group in a 3:2 ratio (see Table S1 for baseline characteristics). Based on the trajectory analysis results, patients were categorized into two groups: class=1 (SAS score increase group) and class=0 (SAS score decrease group). Intergroup analysis of the indicators within the different datasets demonstrated that the overall data characteristics remained consistent across the datasets, whether considering the entire group or focusing on the training and validation sets. However, variables like income, TNM stage, family support, education level, tumor type, medical insurance type, and baseline SAS score exhibited statistically significant differences between the two groups (see Table S2 for baseline characteristics by class). The main anxiety was relapse, economy and family responsibility, accounting for more than 50%, and about 30% of patients expressed two or more anxieties at the same time.

2.3 Variable correlation analysis and univariate, multivariate logistic regression and logistic regression modeling (Lm)

Correlation analysis indicated no significant correlation among the variables examined in the study, which included age, income, surgical procedures, TNM stage, marital status, family support, education level, religious belief, tumor type, type of medical insurance, and baseline SAS score. Additionally, the Variance Inflation Factor (VIF) analysis indicated acceptable collinearity among the variables, with all VIF values being less than 5 (Figure 1E and F). Univariate logistic regression analysis was subsequently conducted, and the area under the curve (AUC) was calculated. Variables with a p <0.05 in the univariate analysis and an AUC greater than 0.6 were included in the multivariate analysis, leading to the establishment of model Lm (Table 2, Univariate and Multivariate Analysis; Supplementary Figure 2). Five variables were selected for the final model: income, family support, tumor type, medical insurance type, and baseline SAS score. In the training set, the AUC of the ROC for model Lm was 0.837 (95% CI: 0.790–0.884), while in the testing set, it was 0.822 (95% CI: 0.757–0.887) (Figure 1G and H). The calibration curve also indicated good predictive performance and stability (Figure 1I and J). Performance metrics for the training set were as follows: Dxy = 0.674, R² = 0.417, Brier score = 0.161, and C-index = 0.837. In the testing set, the values were: Dxy = 0.644, R² = 0.360, Brier score = 0.166, and C-index = 0.822.

Table 2 Univariate and Multivariate Analysis

Random Forest Modeling and Variable Screening (Model 1)

The Random Forest (RF) model was trained using the training set, with a preset of 500 trees. It was observed that the model’s error rate stabilized once the number of trees exceeded 150 (Figure 2A). The RF model achieved an AUC of 0.846 (95% CI: 0.887–0.927) in the training set and 0.875 (95% CI: 0.818–0.931) in the testing set (Figure 2B and C). While the RF model demonstrated slightly higher predictive performance, as reflected by the AUC, compared to the Lm model in both the training and testing sets, this difference was not statistically significant. Given the complexity associated with the clinical application and interpretability of the RF model, further investigation was conducted to identify important variables by calculating their importance within the RF model (Figure 2D). The top five variables, ranked by importance, included baseline SAS score, tumor type, age, education level, and family support. A new logistic regression model (model 1) was established using these variables. Model 1 achieved an AUC of 0.772 (95% CI: 0.715–0.828) in the training set and 0.757 (95% CI: 0.680–0.834) in the testing set (Figure 2E and F). The calibration curve for Model 1 demonstrated the following metrics: in the training set, Dxy = 0.543, R² = 0.291, Brier score = 0.185, and C-index = 0.772; in the testing set, Dxy = 0.514, R² = 0.255, Brier score = 0.186, and C-index = 0.757 (Figure 2G and H).

Figure 2 (A) illustrates the reduction in error as the number of decision trees in the RF model increases. (B and C) depict the ROC curves of the RF model, with the AUC values of 0.846 (0.887–0.927) in the training set and 0.875 (0.818–0.931) in the testing set, demonstrating strong predictive accuracy. (D) presents the variable importance ranking within the random forest model, highlighting the most influential variables contributing to the model’s performance. (E and F) display the ROC curves for Model 1, with an AUC of 0.772 (0.715–0.828) in the training set and 0.757 (0.680–0.834) in the testing set. Finally, (G and H) reveal the calibration curves for Model 1. In the training set, the model demonstrated a Dxy of 0.543, R² of 0.291, Brier score of 0.185, and C-index of 0.772. In the testing set, it achieved a Dxy of 0.514, R² of 0.255, Brier score of 0.186, and C-index of 0.757.

Abbreviation: ROC, receiver operating characteristic curve; AUC, area under the ROC curve.

Xgboost-Based Modeling and Variable Screening for Model 2

The Xgboost method was used to train the model on the training set, achieving an AUC of 0.957 (95% CI: 0.936–0.979). However, when validated on the testing set, the performance of the model dropped significantly, with an AUC of 0.723 (95% CI: 0.643–0.804) (Figure 3A and B). This indicates that although the Xgboost model performed well on the training data, it exhibited poor generalization in the validation set. To better understand the model and identify key variables, SHAP (SHapley Additive exPlanation) was used to interpret the predictions of the model and assess variable importance (Figure 3C–E, and Supplementary Figure 3). Based on this analysis, the top five variables—tumor type, baseline SAS score, education level, and family support—were selected to develop a new logistic regression model (Model 2). Model 2 demonstrated an AUC of 0.813 (95% CI: 0.762–0.864) in the training set and 0.781 (95% CI: 0.710–0.851) in the testing set (Figure 3F and G). Calibration curves for model 2 revealed strong predictive accuracy: in the training set, Dxy = 0.626, R² = 0.366, Brier score = 0.172, and C-index = 0.813; in the testing set, Dxy = 0.561, R² = 0.300, Brier score = 0.181, and C-index = 0.781 (Figure 3H and I). These results indicate that model 2 offers more stable performance across both the training and testing sets compared to the initial Xgboost model.

Figure 3 (A and B) illustrate that the Xgboost-based model achieves an AUC of 0.957 (0.936–0.979) in the training set, indicating excellent predictive performance during training. However, its AUC in the validation set is significantly lower at 0.723 (0.643–0.804), indicating potential overfitting. (C–E) reveal the interpretation and variable importance of the model using SHAP, providing both global and local insights into how individual variables contribute to the predictions of the model. (F and G) depict the ROC curves for Model 2, with an AUC of 0.813 (0.762–0.864) in the training set and 0.781 (0.710–0.851) in the validation set, reflecting stable performance. (H and I) present the calibration curves for Model 2. In the training set, the model achieved a Dxy of 0.626, R² of 0.366, Brier score of 0.172, and C-index of 0.813. In the validation set, the model had a Dxy of 0.561, R² of 0.300, Brier score of 0.181, and C-index of 0.781.

Abbreviation: AUC, area under the ROC curve; SHAP, SHapley Additive explanation; Xgboost, eXtreme Gradient Boosting.

Model Validation and Comparison, Nomogram Visualization, and DCA CIC Curve

The comparison of the three models—Lm, Model 1, and Model 2—revealed no significant differences in the AUC of ROC, but the included variables varied slightly across the models. To further assess the differences among them, the NRI and IDI metrics were used. In the training set: Lm vs Model 1: NRI: −0.2266 [95% CI: −0.3782, −0.0749]; p = 0.00341; IDI: −0.1044 [95% CI: −0.1483, −0.0605]; p = 0.0001 These results indicate that the Lm model significantly outperformed Model 1. Lm vs Model 2: NRI: −0.1458 [95% CI: −0.2626, −0.0289]; p = 0.0145; IDI: −0.0443 [95% CI: −0.0733, −0.0154]; p = 0.00266 Similarly, the Lm model was superior to Model 2. Model 1 vs Model 2: NRI: 0.0869 [95% CI: −0.0513, 0.2252]; p = 0.21783; IDI: 0.0601 [95% CI: 0.0256, 0.0945]; p = 0.00064. No statistically significant difference in NRI was observed between Model 1 and Model 2 (Figure 4A-C), though the IDI indicated some improvement for Model 2. In the validation set, the comparisons were as follows:

Figure 4 (A–C) present the comparison results between the Lm model and Model 1 in the training set. The negative values of the NRI: −0.2266 [−0.3782, −0.0749] and IDI: −0.1044 [−0.1483, −0.0605] indicate that the Lm model outperforms Model 1, with statistically significant p-values (NRI p-value: 0.00341; IDI p-value: 0.0001). When comparing the Lm model to Model 2, the NRI (−0.1458 [−0.2626, −0.0289], p-value: 0.0145) and IDI (−0.0443 [−0.0733, −0.0154], p-value: 0.00266) also demonstrate that the Lm model is superior. However, the comparison between Model 1 and Model 2 reveals no statistically significant difference, with an NRI of 0.0869 [−0.0513, 0.2252] and a p-value of 0.21783, and an IDI of 0.0601 [0.0256, 0.0945] and a p-value of 0.00064. (D–F) display the comparison results for the validation set. The NRI (−0.186 [−0.3471, −0.0249], p-value: 0.02365) and IDI (−0.0676 [−0.1091, −0.0262], p-value: 0.00137) reveal that the Lm model is superior to Model 1 in the validation set as well. Similarly, comparisons between the Lm model and Model 2 reveal that Lm performs better, with an NRI of −0.186 [−0.3216, −0.0504], p-value: 0.00719, and an IDI of −0.0444 [−0.0761, −0.0127], p-value: 0.00599. The comparison between Model 1 and Model 2 in the validation set reveals no significant difference, with an NRI of 0.0076 [−0.1373, 0.1526], p-value: 0.91788, and an IDI of 0.0233 [−0.0041, 0.0506], p-value: 0.09513. (G and H) present the DCA and CIC of the Lm model in the full dataset, indicating the clinical use and impact of the model. Finally, (I) illustrates the nomogram for the Lm model in the full dataset, which provides a scoring system for predicting outcomes based on the variables of the model.

Abbreviation: NRI, Net Reclassification Improvement; IDI, Integrated Discrimination Improvement; DCA, Decision Curve Analysis; CIC, Clinical impact curve.

Lm vs Model 1: NRI: −0.186 [95% CI: −0.3471, −0.0249]; p = 0.02365; IDI: −0.0676 [95% CI: −0.1091, −0.0262]; p = 0.00137. The Lm model demonstrated superior predictive performance compared to Model 1.

Lm vs Model 2: NRI: −0.186 [95% CI: −0.3216, −0.0504]; p = 0.00719; IDI: −0.0444 [95% CI: −0.0761, −0.0127]; p = 0.00599. The Lm model was superior to Model 2.

Model 1 vs Model 2: NRI: 0.0076 [95% CI: −0.1373, 0.1526]; p = 0.91788; IDI: 0.0233 [95% CI: −0.0041, 0.0506]; p = 0.09513.

No statistically significant difference was observed between Model 1 and Model 2 in the validation set. Overall, the results from NRI and IDI metrics demonstrated that the Lm model was consistently the best-performing model across both the training and validation sets (Figure 4D–F). Based on this, Lm was selected as the optimal model for this study. Finally, the DCA and CIC validated the clinical use of the Lm model, and a nomogram was developed for practical application (Figure 4G–I).

Model Variable Score Conversion, Modeling and Validation, Nomogram Visualization, and DCA/CIC Curves

The Lm model demonstrated effective predictive capacity for SAS score trajectories, making it suitable for practical clinical use. Consequently, efforts were made to simplify the model into a scoring system. The baseline SAS scores were categorized into four ranges: [34,40] [34,40], (40,47] (40,47], (47,54] (47,54], and (54,61] (54,61], reflecting trends in continuous variable changes. Other variables were converted into categorical factors for analysis, and logistic regression coefficients were used to calculate the corresponding scores for different stages (see Table 3: Scores of Different Variables). After summarizing the scores, the dataset was reanalyzed, and ROC curves were generated for both the training set and the full dataset. The results indicated that the model achieved an AUC 0.854 (95% CI: 0.794–0.913) in the testing set and an AUC of 0.847 (95% CI: 0.811–0.884) in the full dataset (see Figure 5A and B). Additional performance metrics included Dxy = 0.707, R² = 0.433, Brier score = 0.147, and C-index = 0.854 in the testing set, and Dxy = 0.695, R² = 0.433, and Brier = 0.153 in the full dataset (see Figure 5C and D).

Table 3 Score of Different Variable

Figure 5 (A and B) illustrate the ROC curves of the Score Model, revealing its performance in both the testing set and the full dataset. The AUC for the testing set is 0.854 (95% CI: 0.794–0.913), while in the full dataset, the AUC is 0.847 (95% CI: 0.811–0.884), indicating strong discriminatory ability in both datasets. (C and D) reveal the calibration curves of the Score Model for the testing set and full dataset, respectively. In the testing set, the model achieves a Dxy of 0.707, R2 of 0.433, a Brier score of 0.147, and a C-index of 0.854, demonstrating good calibration. Similarly, in the full dataset, the Dxy is 0.695, R2 remains 0.433, with a Brier score of 0.153 and a C-index of 0.847, confirming the stability and consistency of the model’s predictions. (E) presents the results from applying the Score Model to 10 randomly generated data groups using a 10-fold resampling method. By merging the results via META-analysis, the panel demonstrates that the Score Model consistently achieves high predictive performance across different groups, further confirming its stability. (F) reveals the distribution plot, which indicates a linear relationship between the predicted scores and the probability of a higher SAS score. As the model score increases, the likelihood of a higher SAS score also rises. (G and H) depict the DCA and CIC of the Score Model, both of which indicate that the model has strong clinical use, effectively helping in decision-making based on predicted probabilities. (I) displays the nomogram for the Score Model, which offers a visual tool to assess individual patient risks, supporting clinical application and facilitating personalized decision-making.

Abbreviation: DCA, Decision Curve Analysis; CIC, Clinical impact curve;, ROC, receiver operating characteristic curve; AUC, area under the ROC curve; SAS, Self-Rating Anxiety Scale.

The stability of the scoring model was further assessed using a 10-fold resampling method. This involved generating 10 random groups from the full dataset and applying the scoring model to each. A META-analysis approach was used to merge the results, minimizing potential biases from individual training and validation sets. The analysis confirmed that the scoring model maintained high predictive accuracy and stability across different groups (Figure 5E). To validate the performance of the model, a distribution plot was created, which demonstrated that the likelihood of an increase in SAS scores correlated positively with higher values, displaying an overall linear relationship (Figure 5F). This finding confirmed the model’s capability to consistently predict score increases. DCA and CIC were also plotted, further highlighting the favorable clinical performance of the model (see Figure 5G and H). Finally, a nomogram was constructed to provide a visual representation of the scoring model, enhancing its accessibility for clinical application (see Figure 5I).

Discussion

Breast cancer is among the most prevalent malignancies affecting women. Research indicates that patients with breast cancer face an elevated risk of anxiety,¹⁷ with approximately 45% experiencing severe anxiety at the time of diagnosis.¹⁸ This anxiety is not limited to the period of diagnosis or treatment; many patients continue to suffer from chronic anxiety. Approximately 50% of patients report anxiety within the first-year post-diagnosis, with a gradual decline over time. However, about 25% still experience anxiety five years later.¹⁹ Anxiety is not conducive to the prognosis of breast cancer patients and requires high attention.²⁰ It is often insufficient to judge the anxiety state only at a certain point in time. Only by accurately judging the changes in patients’ anxiety can we accurately carry out intervention and effectively improve the anxiety state of patients while avoiding the waste of medical resources.

In the present study, the trajectory of SAS scores was assessed every three months from the conclusion of treatment over a 12-month follow-up period. The findings revealed that approximately one-third of patients remained in a persistent state of anxiety, consistent with the findings of Boys et al.²¹ Boys et al examined the prevalence and short-term trajectories of anxiety and depression during the first year following diagnosis, reporting that 22% and 21% of cancer patients exhibited anxiety at six- and 12-months post-diagnosis, respectively. It is important to highlight that the time frame of this present study differs from that of Boys et al, as it commenced from the conclusion of treatment and focused on trends in SAS score changes while excluding any influences associated with the treatment period.

A notable observation in the current study was that some patients initially experienced a decrease in SAS scores after treatment, followed by a gradual increase over time, which contrasts with prior findings. Saboonchi et al²² conducted an investigation into the anxiety trajectories of 725 survivors who had breast cancer over a two-year period, identifying four distinct trajectory classes: High Stable, High Decrease, Mild Decrease, and Low Decrease. The discrepancies in findings may be attributed to the exclusion of patients in our study who experienced major stressful life events during follow-up, as previous research has demonstrated a strong association between such events and heightened anxiety.²³ Patients who encounter significant stressful life events during follow-up, as well as those who experience prolonged anxiety despite the absence of such events, warrant focused attention. Effectively identifying patients at risk for sustained anxiety remains a key priority in optimizing post-treatment care.

In a previous study, a correlation was identified between negative emotions and prognosis in patients with breast cancer.²⁴ Baseline characteristics, including negative emotion scores, were effective predictors of emotional states following treatment.²⁵ Additionally, efforts were made to identify high-risk populations and develop intervention models through predictive modeling.²⁶ The current study further examines this relationship.

Trajectory analysis confirmed that the predefined tripartite trajectory trends most closely matched the observed changes in the enrolled population. Notably, patients whose SAS scores initially decreased but later increased were identified as a group of concern. This group, along with those whose SAS scores had a gradual increase, was classified as “class=1” and designated as the SAS-increasing group. A total of 11 variables were included in the calculation according to the event incidence of 0.6 in this study, the two-component logistic regression, and the new model R2=0.417. A total of 222 events were required for a total enrollment of 369. In this study, 424 events and 267 events were collected, and the EPP calculated in the current data queue was 20.13, which met the requirements of the guidelines.²⁷

To examine potential predictors of anxiety trajectory, several baseline indices were included in the analysis like age, income, surgical procedures, TNM stage, marital status, family support, educational level, religious belief, tumor type, type of medical insurance, and baseline SAS score. The AUC of the ROC for model Lm was 0.837 (95% CI: 0.790–0.884) in the training set and 0.822 (95% CI: 0.757–0.887) in the testing set, demonstrating acceptable predictive performance. The calibration curve also revealed that model Lm exhibited stability and provided an accurate interpretation of the linear relationship between variables and outcomes.

The RF model, which is composed of multiple decision trees, mitigates overfitting by averaging the results of individual trees. This method maintains strong predictive power while also enabling the ranking of variable importance. In this study, the RF model achieved an AUC of 0.846 (95% CI: 0.887–0.927) in the training set and 0.866 (95% CI: 0.806–0.925) in the testing set, demonstrating enhanced performance over the linear regression model. In contrast to random forests, the Xgboost method constructs decision trees sequentially, with each tree designed to correct errors made by the preceding tree. In this study, the AUC of the Xgboost model was 0.957 (95% CI: 0.936–0.979) in the training set, but it decreased to 0.723 (95% CI: 0.643–0.804) in the validation model in the testing set. Notably, overfitting was observed in the training set, leading to a reduced accuracy in the testing set.

In contrast, the RF model demonstrated better predictive performance compared to the Lm in the validation set, though the difference was not statistically significant. However, the RF model’s clinical interpretation was more complex and less intuitive. To address these challenges, we implemented variable screening methods and developed Model 1, which yielded an AUC of 0.772 (95% CI: 0.715–0.828) in the training set and 0.757 (95% CI: 0.680–0.834) in the testing set. Subsequently, Model 2 was developed, yielding an AUC of 0.813 (95% CI: 0.762–0.864) in the training set and 0.781 (95% CI: 0.710–0.851) in the testing set.

Upon further analysis, it was found that the Lm model outperformed both Model 1 and Model 2 in terms of ROC and calibration curve performance. To further compare the predictive performance of these models, we used NRI and IDI metrics.²⁸ The results indicated that the Lm model demonstrated superior predictive capability compared to Model 1 and Model 2 in both the training and testing sets. As a result, the Lm model was identified as the optimal model for this study. The DCA and CIC further supported the clinical use of the Lm model, and a nomogram was constructed to visually represent the scoring system. However, the application of such models in clinical settings can be cumbersome.

Further investigation revealed a linear relationship between baseline SAS scores and clinical outcomes. Consequently, baseline SAS scores were categorized into intervals of [34,40], (40,47], (47,54], and (54,61], and all other variables were transformed into categorical factors for analysis. Through logistic regression, coefficients were converted into scores for different stages, thereby simplifying the model into a score-based model.

ROC curves were plotted for the training set and the complete data set, and calibration curves were plotted. These results show that the model has strong predictive performance and stability. This helps to create a scoring system to evaluate each patient, enabling an initial assessment of the likelihood of a particular SAS trajectory trend based on the total score, which can serve as a guide for clinical practice.

Previous predictive models for cancer-related behavioral symptoms have been successful, providing evidence for clinical guidance.²⁹ Despite the progress made in constructing and validating the model, there remains room for enhancement in its overall predictive accuracy. This is likely due to the changes in the SAS score, which are influenced by multiple factors, rather than a single determinant. Moreover, negative emotions encompass not only anxiety but also depression, fear, and other related conditions, which may have undiscovered interactions that further complicate prediction models.³⁰

Several limitations of the study should be acknowledged. Although test-retest reliability can effectively increase the credibility of data, this study did not adopt such a method. First of all, this study is a retrospective analysis, and the subsequent data analysis can no longer make up for such defects. Meanwhile, when screening patient data, we tried to retain patients with complete records and cooperation. Thus, avoiding bias that may exist due to patients not taking it seriously. But it also Narrows the scope of our findings. The inclusion of patients from a single center may introduce selection bias, affecting the generalizability of the findings. Secondly, patients who agree and accept the study and can follow up and cooperate on time may have certain psychological differences from patients who disagree, do not accept or cannot complete the follow-up regularly. This may be related to the psychological state of the patients, which may also lead to the possibility that excluded patients may have a greater risk of anxiety and were not included in the study, which also leads to the limitations of the study. Furthermore, while the exclusion of life stress events and disease progression during follow-up helps maintain model stability, it restricts the broader applicability of the model. Additionally, the predictive outcomes are time-point specific and do not comprehensively reflect the entire trajectory of anxiety changes over time. Patient-reported outcomes (PRO) are getting more and more attention, and many large studies have used PRO as an outcome.^31,32 This study did not incorporate PRO data, which represents an area for future exploration that could aid in the prevention and early intervention of anxiety in clinical settings. Despite certain limitations, our research methods and approaches can be replicated across various centers, providing valuable insights for addressing such issues in clinical practice. Based on the limitations of the current research, follow-up research is very necessary, and in the follow-up research, mixed methods such as combining qualitative data can be adopted to further improve the quality of the research.

Conclusion

Patients diagnosed with breast cancer often exhibit distinct trajectories of anxiety within the first 12 months following treatment, and it is feasible to predict these trajectories based on baseline variables. This leveraged predictive models to identify high-risk groups prone to persistent anxiety, thereby informing clinical decisions to mitigate the adverse effects of this emotional state. However, the retrospective nature of the study and the relatively small, single-center sample size may limit the generalizability of the developed model. Further verification through large-scale, multicenter clinical studies are warranted. Despite these limitations, the proposed approach has shown preliminarily validation and may provide a valuable framework for addressing similar issues in clinical settings.

Data Sharing Statement

Date will be made available on request from the corresponding author on reasonable request.

Ethics Approval and Consent to Participate

This study was conducted in accordance with the declaration of Helsinki.This study was conducted with approval from the Ethics Committee of the First People’s Hospital of Lianyungang, The Affiliated Hospital of XuZhou Medical University. A written informed consent was obtained from all participants.

Consent for Publication

Consent for publication was obtained from every individual whose data are included in this manuscript.

Funding

No funding was received.

Disclosure

None of the authors have any financial disclosure or conflict of interest.

References

1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2024;74(3):229–263. doi:10.3322/caac.21834

2. Nierengarten MB. Global cancer statistics 2022: the report offers a view on disparities in the incidence and mortality of cancer by sex and region worldwide and on the areas needing attention. Cancer. 2024;130(15):2568. doi:10.1002/cncr.35444

3. Li Q, Xia C, Li H, et al. Disparities in 36 cancers across 185 countries: secondary analysis of global cancer statistics. Front Med. 2024;18(5):911–920.

4. Getu MA, Chen C, Panpan W, et al. The effect of cognitive behavioral therapy on the quality of life of breast cancer patients: a systematic review and meta-analysis of randomized controlled trials. Qual Life Res. 2021;30(2):367–384. doi:10.1007/s11136-020-02665-5

5. An Y, Fu G, Yuan G. Quality of life in patients with breast cancer: the influence of family caregiver’s burden and the mediation of patient’s anxiety and depression. J Nerv Ment Dis. 2019;207(11):921–926. doi:10.1097/NMD.0000000000001040

6. Hashemi SM, Rafiemanesh H, Aghamohammadi T, et al. Prevalence of anxiety among breast cancer patients: a systematic review and meta-analysis. Breast Cancer. 2020;27(2):166–178. doi:10.1007/s12282-019-01031-9

7. Wang X, Wang N, Zhong L, et al. Prognostic value of depression and anxiety on breast cancer recurrence and mortality: a systematic review and meta-analysis of 282,203 patients. mol Psychiatry. 2020;25(12):3186–3197. doi:10.1038/s41380-020-00865-6

8. Zhang J, Zhou Y, Feng Z, et al. Longitudinal trends in anxiety, depression, and quality of life during different intermittent periods of adjuvant breast cancer chemotherapy. Cancer Nurs. 2018;41(1):62–68. doi:10.1097/NCC.0000000000000451

9. Gokce Ceylan G, Gok Metin Z. Symptom status, body perception, and risk of anxiety and depression in breast cancer patients receiving paclitaxel: a prospective longitudinal study. Support Care Cancer. 2022;30(3):2069–2079. doi:10.1007/s00520-021-06619-6

10. Dinapoli L, Colloca G, Di Capua B, et al. Psychological aspects to consider in breast cancer diagnosis and treatment. Curr Oncol Rep. 2021;23(3):38. doi:10.1007/s11912-021-01049-3

11. Kus T, Aktas G, Ekici H, et al. Illness perception is a strong parameter on anxiety and depression scores in early-stage breast cancer survivors: a single-center cross-sectional study of Turkish patients. Support Care Cancer. 2017;25(11):3347–3355. doi:10.1007/s00520-017-3753-1

12. Kong LX, Zhao YH, Feng ZL, et al. Personalized and continuous care intervention affects rehabilitation, living quality, and negative emotions of patients with breast cancer. World J Psychiatry. 2024;14(6):876–883. doi:10.5498/wjp.v14.i6.876

13. Zhou K, Li J, Li X. Effects of cyclic adjustment training delivered via a mobile device on psychological resilience, depression, and anxiety in Chinese post-surgical breast cancer patients. Breast Cancer Res Treat. 2019;178(1):95–103. doi:10.1007/s10549-019-05368-9

14. He X, Wang X, Fu X. The effects of the quality nursing mode intervention on the psychological moods, postoperative complications, and nursing satisfaction of breast cancer surgery patients. Am J Transl Res. 2021;13(10):11540–11547.

15. Tu C, He Y, Ma X. Factors influencing psychological distress and effects of stepwise psychological care on quality of life in patients undergoing chemotherapy after breast cancer surgery. Am J Transl Res. 2022;14(3):1923–1933.

16. Li W, Zhang Q, Xu Y, et al. Group-based trajectory and predictors of anxiety and depression among Chinese breast cancer patients. Front Public Health. 2022;10:1002341. doi:10.3389/fpubh.2022.1002341

17. Carreira H, Williams R, Müller M, et al. Associations between breast cancer survivorship and adverse mental health outcomes: a systematic review. J Natl Cancer Inst. 2018;110(12):1311–1327. doi:10.1093/jnci/djy177

18. Villar RR, Fernández SP, Garea CC, et al. Quality of life and anxiety in women with breast cancer before and after treatment. Rev Lat Am Enfermagem. 2017;25:e2958. doi:10.1590/1518-8345.2258.2958

19. Burgess C, Cornelius V, Love S, et al. Depression and anxiety in women with early breast cancer: five year observational cohort study. BMJ. 2005;330(7493):702. doi:10.1136/bmj.38343.670868.D3

20. Wang YH, Li JQ, Shi JF, et al. Depression and anxiety in relation to cancer incidence and mortality: a systematic review and meta-analysis of cohort studies. mol Psychiatry. 2020;25(7):1487–1499. doi:10.1038/s41380-019-0595-x

21. Boyes AW, Girgis A, D’Este CA, et al. Prevalence and predictors of the short-term trajectory of anxiety and depression in the first year after a cancer diagnosis: a population-based longitudinal study. J Clin Oncol. 2013;31(21):2724–2729. doi:10.1200/JCO.2012.44.7540

22. Saboonchi F, Petersson LM, Wennman-Larsen A, et al. Trajectories of anxiety among women with breast cancer: a proxy for adjustment from acute to transitional survivorship. J Psychosoc Oncol. 2015;33(6):603–619. doi:10.1080/07347332.2015.1082165

23. Fassett-Carman A, Hankin BL, Snyder HR. Appraisals of dependent stressor controllability and severity are associated with depression and anxiety symptoms in youth. Anxiety Stress Coping. 2019;32(1):32–49. doi:10.1080/10615806.2018.1532504

24. Shen J, Zhou D, Wang M, et al. Development and validation of a nomogram model of depression and sleep disorders and the risk of disease progression in patients with breast cancer. BMC Womens Health. 2024;24(1):385. doi:10.1186/s12905-024-03222-9

25. Sun W, Shen J, Sun R, et al. Establishment and validation of a predictive model for post-treatment anxiety based on patient attributes and pre-treatment anxiety scores. Psychol Res Behav Manag. 2023;16:3883–3894. doi:10.2147/PRBM.S425055

26. Shen J, Wang M, Li F, et al. Non-inferiority study on the precise implementation of multidisciplinary continuous nursing intervention in patients with breast cancer experiencing negative emotions. Cancer Manag Res. 2022;14:1759–1770. doi:10.2147/CMAR.S354214

27. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ. 2015;350(4):g7594. doi:10.1136/bmj.g7594

28. Pepe MS, Fan J, Feng Z, et al. The net reclassification index (NRI): a misleading measure of prediction improvement even with independent test data sets. Stat Biosci. 2015;7(2):282–295. doi:10.1007/s12561-014-9118-0

29. Pagliuca M, Havas J, Thomas E, et al. Long-term behavioral symptom clusters among survivors of early-stage breast cancer. development and validation of a predictive model. J Natl Cancer Inst. 2025;117(1):89–102.

30. Soqia J, Al-Shafie M, Agha LY, et al. Depression, anxiety and related factors among Syrian breast cancer patients: a cross-sectional study. BMC Psychiatry. 2022;22(1):796. doi:10.1186/s12888-022-04469-y

31. Curigliano G, Dunton K, Rosenlund M, et al. Patient-reported outcomes and hospitalization data in patients with HER2-positive metastatic breast cancer receiving trastuzumab deruxtecan or trastuzumab emtansine in the Phase III DESTINY-Breast03 study. Ann Oncol. 2023;34(7):569–577. doi:10.1016/j.annonc.2023.04.516

32. Rugo HS, O’Shaughnessy J, Boyle F, et al. Adjuvant abemaciclib combined with endocrine therapy for high-risk early breast cancer: safety and patient-reported outcomes from the monarchE study. Ann Oncol. 2022;33(6):616–627.

Creative Commons License © 2025 The Author(s). This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution - Non Commercial (unported, 3.0) License. By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms.

Download Article [PDF]

Development and Validation of a Predictive Model for Anxiety Trajectories in Patients with Breast Cancer: A Retrospective Study

Introduction

Materials and Methods

Research Subjects

Observation Indicators

Statistical Methods

Results

Trajectory Analysis of the Longitudinal Data

Basic Data and Grouping of the Study Group

Random Forest Modeling and Variable Screening (Model 1)

Xgboost-Based Modeling and Variable Screening for Model 2

Model Validation and Comparison, Nomogram Visualization, and DCA CIC Curve

Model Variable Score Conversion, Modeling and Validation, Nomogram Visualization, and DCA/CIC Curves

Discussion

Conclusion

Data Sharing Statement

Ethics Approval and Consent to Participate

Consent for Publication

Funding

Disclosure

References

Recommended articles

Validation of a Disease-Free Survival Prediction Model Using UBE2C and Clinical Indicators in Breast Cancer Patients

ABVS-Based Radiomics for Early Predicting the Efficacy of Neoadjuvant Chemotherapy in Patients with Breast Cancers

Psychological Stress and Its Correlations to Patients with Acute Lymphedema After Breast Cancer Surgery

The Clinical Study of Intratumoral and Peritumoral Radiomics Based on DCE-MRI for HER-2 Positive and Low Expression Prediction in Breast Cancer