Risk Factors for Gout in Taiwan Biobank: A Machine Learning Approach

Yu-Ruey Liu; Oswald Ndi Nfor; Ji-Han Zhong; Chun-Yuan Lin; Yung-Po Liaw

doi:10.2147/JIR.S490821

Back to Journals » Journal of Inflammation Research » Volume 17

Original Research

Risk Factors for Gout in Taiwan Biobank: A Machine Learning Approach

Authors Liu YR, Nfor ON, Zhong JH, Lin CY, Liaw YP

Received 9 August 2024

Accepted for publication 19 November 2024

Published 26 November 2024 Volume 2024:17 Pages 9847—9856

DOI https://doi.org/10.2147/JIR.S490821

Checked for plagiarism Yes

Review by Single anonymous peer review

Peer reviewer comments 2

Editor who approved publication: Dr Tara Strutt

Download Article [PDF]

Yu-Ruey Liu,^{1– 3} Oswald Ndi Nfor,⁴ Ji-Han Zhong,⁴ Chun-Yuan Lin,³ Yung-Po Liaw^{4– 6}

¹College of Information and Electrical Engineering, Asia University, Taichung, 413, Taiwan; ²Department of Emergency Medicine, Cheng Ching General Hospital, Taichung, Taiwan; ³Department of Computer Science and Information Engineering, Asia University, Taichung, 413, Taiwan; ⁴Department of Public Health and Institute of Public Health, Chung Shan Medical University, Taichung, Taiwan; ⁵Institute of Medicine, Chung Shan Medical University, Taichung, Taiwan; ⁶Department of Medical Imaging, Chung Shan Medical University Hospital, Taichung, Taiwan

Correspondence: Yung-Po Liaw, Department of Public Health and Institute of Public Health, Chung Shan Medical University, No. 110, Sec. 1 Jianguo N. Road, Taichung, 40201, Taiwan, Tel +886-4-36097501, Email [email protected] Chun-Yuan Lin, Department of Computer Science and Information Engineering, Asia University, No. 500, Lioufeng Road, Wufeng, Taichung, 413, Taiwan, Tel +886-4-2332-3456 # 1814, Email [email protected]

Purpose: We assessed the risk of gout in the Taiwan Biobank population by applying various machine learning algorithms. The study aimed to identify crucial risk factors and evaluate the performance of different models in gout prediction.
Patients and Methods: This study analyzed data from 88,210 individuals in the Taiwan Biobank, identifying 19,338 cases of gout and 68,872 controls. After data cleaning and propensity score matching for gender and age, the final analytical sample comprised 38,676 individuals (19,338 gout cases and 19,338 controls). Five machine learning models were used: Bayesian Network (BN), Random Forest (RF), Gradient Boosting (GB), Logistic Regression (LR), and Neural Network (NN). The predictive performance was evaluated using a split dataset (80% training set and 20% test set).
Results: Variable importance analysis was performed to identify key variables, with uric acid and gender emerging as the most influential risk factors. Descriptive data highlighted significant differences between the control group and gout patients, with a higher prevalence of gout in men (51.36% vs 48.64%). Both the RF and GB demonstrated high performance across multiple metrics, with RF consistently achieving a high area under the curve (AUC) of 0.986 to 0.987, alongside excellent sensitivity (0.945– 0.947) and specificity (0.998– 0.999). GB also performed robustly, with AUC values around 0.987– 0.988 and maintaining high sensitivity (0.944– 0.950) and specificity (0.995– 0.999) across different model variations. The F1 scores for both models (GB and RF) indicate strong predictive capabilities, with values around 0.971– 0.972.
Conclusion: The RF and GB demonstrated exceptional accuracy in predicting gout status, particularly when incorporating genetic data alongside clinical factors. These findings underscore the potential for integrating machine learning models with genetic information to enhance gout prediction accuracy in clinical practice.

Keywords: risk prediction, gout, machine learning, artificial intelligence

Introduction

Gout is a complex inflammatory condition primarily caused by hyperuricemia leading to monosodium urate (MSU) crystal deposition in joints and other tissues. Its clinical presentation includes acute painful flares and chronic complications.¹ The disease etiology is multifactorial, involving genetic predispositions, lifestyle factors such as diet and alcohol consumption, and certain medical conditions that affect uric acid metabolism.^2,3 Gout is associated with various comorbidities, including cardiovascular diseases and renal impairment, which can complicate its management and exacerbate patient morbidity.^1,4

Gout is a prevalent and debilitating rheumatic disease with an increasing global incidence, especially in Pacific and developed countries.⁵ The Taiwanese population, like many others, faces the growing burden of gout, with a reported prevalence of approximately 6.24%.⁶ The high recurrence rate linked to this condition leads to diminished health-related quality of life^7,8 and heightened financial strain, particularly for individuals unable to manage it effectively.⁹

On a global scale, there were 10,016,336 reported cases in 2023, which is anticipated to rise to approximately 12,082,807 by 2035.⁷ Understanding the risk factors and developing effective prediction models for gout is essential for proactive management and prevention. In this context, the use of data-driven methods, particularly machine learning (ML), has gained prominence as a powerful tool for disease risk assessment.¹⁰

The conventional approach (that is, traditional statistical techniques, including logistic regression models) to assessing gout risk often relies on well-established clinical risk factors such as age, gender, genetics, lifestyle, and serum urate levels.¹¹ While these factors provide valuable insights, they may not fully capture the complexity of the disease, and there is an increasing need for more accurate, individualized risk assessments. According to previous research findings,^10,12 conventional models, which greatly rely on multivariate regression models overly simplify the links between disease risk factors and outcomes excessively, reducing the prediction accuracy. This is where machine learning techniques come into play.

Machine learning, with its capacity to analyze vast datasets, uncover intricate nonlinear relationships between variables, and generate predictive models, offers a promising alternative to conventional methods. Through the application of ML algorithms, researchers and clinicians can explore a broader spectrum of risk factors, potentially identifying subtle associations and interactions that might be missed by traditional statistical approaches. The goal of this scholarly article is to provide a comprehensive comparative analysis of machine learning and conventional approaches for assessing gout risk among the Taiwanese population. By evaluating the strengths and limitations of each approach, we intend to shed light on their respective roles in enhancing the precision and efficacy of gout risk assessment.

The emergence of machine learning in disease risk assessment marks a pivotal moment in the evolution of medical diagnostics and preventive medicine. Nevertheless, we recognize the importance of weighing its potential benefits against the established strengths of conventional approaches, which have long been the cornerstone of clinical practice. Machine learning classifiers have been successfully used to predict diseases risk including but not limited to heart diseases,¹³ diabetes^14–16 and its complications,¹⁷ and others.

An earlier study¹⁸ explored gout staging through machine learning methodologies and suggested further research to enhance the precision of diagnostic models. In light of this, we explored machine learning approaches to determine their performance in gout risk assessment.

Materials and Methods

Data Source and Disease Identification

The study data were collected from the TWB dataset, which had phenotypic and genetic data collected from Han Chinese individuals between the ages of 30 and 70. These individuals had completed a standardized questionnaire on physical, sociodemographic, and past medical history. The database is a large-scale population-based biobank managed by the National Health Research Institutes (NHRI) of Taiwan. Established in 2010, the biobank aims to collect comprehensive health and genetic data from the Taiwanese population to support research on various diseases and public health issues. It includes biological samples, health records, and lifestyle information from over 200,000 participants. Its primary purpose is to facilitate research into the genetic, environmental, and lifestyle factors contributing to health and disease in Taiwan. By providing a rich resource for researchers, the Taiwan Biobank plays a crucial role in advancing precision medicine and improving public health strategies in the region.

Genotyping in the biobank was conducted using the C2-58 Axiom Genome-Wide TWB 2.0 Array, developed specifically for people of Taiwanese descent. The dataset included information on gout diagnosis and related variables. We extracted data for 88,347 individuals assessed at the biobank centers from 2008 to 2019.

In Taiwan, the diagnosis of gout typically involves a combination of clinical examination and specific diagnostic criteria. While clinical examination of monoarticular arthritis is indeed a significant aspect of the diagnosis, it is not the sole method. The diagnosis is often supported by the presence of MSU crystals in synovial fluid or tophi, which provides definitive evidence of gout.¹⁹ In our research, gout was primarily identified through self-reporting, as indicated by the question, “Have you or your family members (biological parents or blood-related siblings) ever been diagnosed with gout?” A response of “yes” signified the presence of the condition, while a response of “no” indicated its absence. Additionally, disease diagnosis was supported by the measurement of uric acid levels; specifically, a uric acid level of 7 mg/dL or higher in men and 6 mg/dL or higher in women. An individual was considered to have gout if either the self-report response was “yes” or if the uric acid levels were elevated. Comprehensive descriptions of the lifestyle factors and other variables considered in our study have been documented in previous reports.^20,21

Data Processing and Model Development

Our data were cleaned and subjects with and without gout (n = 88,210) were extracted. Figure 1 illustrates the data processing pipeline. From the above subjects, those with missing values (n = 137) were excluded. The algorithms used to assess matrices like the Youden index, AUC, sensitivity, and specificity included BN, RF, GB, LR, and NN. Details of these algorithms have been described elsewhere.²² These algorithms were selected based on their established efficacy in previous research.²³ The traditional model we employed is logistic regression, a widely recognized statistical technique for binary classification tasks, which serves as a benchmark against which we compared the performance of the machine learning algorithms. In terms of performance metrics, we focused on the following key indicators: accuracy (the overall proportion of correct predictions (both true positives and true negatives) among all observations), sensitivity or recall (which indicates the model’s ability to correctly identify positive cases (true positives) out of all actual positive cases), specificity (which measures the ability of the model to correctly identify negative cases (true negatives) out of all actual negative cases), and F1 score (the harmonic mean of precision and recall, providing a balance between the two).

Figure 1 Data processing pipeline.

Statistical Analysis

We utilized SAS Software version 9.4 (SAS Institute, Cary, NC, USA) and PLINK 1.90 beta software²⁴ for data management. Continuous and discrete variable distributions were assessed using t-tests and chi-square tests. Categorical variables were presented as count and percentages %, while continuous variables as means and standard deviations (SD). We estimated propensity scores using the PROC LOGISTIC and PROC SQL was used to cases and controls in a 1:1 ratio, ensuring precise alignment on gender and age to minimize bias. We checked balance between the case and control group by calculating the Standardized Mean Differences (SMD). We employed logistic regression to estimate the AUC for both internal and external factors in relation to gout.

For the development of AI models, we harnessed SAS^® Viya^® version 3.5 (SAS Institute Inc., Cary, NC, USA). In this study, we considered various supervised learning models described above. We divided the dataset into training (80%) and test (20%) partitions before constructing machine learning models. The AUC, which signifies the area under the Receiver Operating Characteristic (ROC) curve, was used to assess model performance. An AUC value close to 1 indicates a near-perfect model. To select the best model in machine learning model comparison, we calculated Youden’s Index (also known as Youden’s J statistic).

In both basic statistics and machine learning models, we designated gout as the outcome and target variable. Adjusted covariates included age, gender, body mass index, smoking status, alcohol consumption, exercise habits, uric acid, creatinine, high-density lipoprotein (HDL)-cholesterol, low-density lipoprotein (LDL)-cholesterol, triglycerides (TG), and ABCG2 rs2231142. Different covariates were considered in various models. Moreover, we incorporated a variety of genetic models to enhance the analysis and interpretation of gout risk, potentially yielding more meaningful insights into the role of genetics in health outcomes.

Results

As shown in Figure 1, this study analyzed data from 88,210 individuals in the TWB, identifying 19,338 cases of gout and 68,872 controls. After data cleaning and propensity score matching for gender and age, the final analytical sample comprised 38,676 individuals (19,338 gout cases and 19,338 controls). At baseline and prior to propensity score matching, the mean age (SD) was 49.85 (10.78) years for the controls and 51.56 (11.01) years for the cases with gout (p<0.001) (Table 1). There were more men with gout compared to women (51.36% vs 48.64%). Before matching, there were significant differences in all variables between cases and controls (p<0.001). The variable importance (determined using the tree-based methods including the Mean Decrease Impurity [Gini Importance]) for models with and without genetic marker is shown in Figure 2A and B. There were 12 most important variables, with uric acid and gender emerging as the most important risk factors for gout.

Table 1 Basic Characteristics of Case and Control Populations Before and After Propensity Score Matching

Figure 2 (A and B) show the variables’ importance in the additive model before and after propensity score matching. The champion model before propensity score matching was Random Forest and Gradient Boosting after matching.

Table 2 shows the performance metrics for the test data after propensity score matching.

Table 2 Performance Metrics for the Test Data After Propensity Score Matching

Overall, our dataset was split into a 20% test set and an 80% training set. Across various models displayed in the table, the Random Forest and Gradient Boosting demonstrated superior accuracy (Figure 3A). The RF achieved an AUC of 0.986, sensitivity of 0.947, specificity of 0.998, and F1 score of 0.972. Similarly, GB performed comparably with an AUC of 0.987, sensitivity of 0.944, specificity of 0.999, and F1 score of 0.971, highlighting its robust predictive capability even without genetic information.

Figure 3 (A and B) show ROC curves for model 1 (which adjusted for 11 core baseline variables including gender, age, BMI, smoking, drinking, exercise, uric acid, creatinine, HDL, LDL, and TG) and Model 2 (which adjusted for both baseline variables alongside a well-established risk locus (ABCG2 rs2231142) associated with gout). The champion model was Random Forest for 3A and Gradient Boosting for 3B.

Incorporating the ABCG2 rs2231142 risk locus improved prediction metrics marginally across both models (Figure 3B). For instance, in Model 1b (additive adjustment), RF achieved an AUC of 0.987, with a sensitivity of 0.947, specificity of 0.998, and an F1 score of 0.972. GB, in this configuration, also reached an AUC of 0.987 with a sensitivity of 0.950, specificity of 0.995, and F1 score of 0.972.

Across all adjustments (Models 1a to 1d), both RF and GB maintained AUCs above 0.986 and high sensitivity and specificity scores, confirming these models as highly effective predictors of gout. The F1 scores for both models consistently exceeded 0.97, indicating balanced precision and recall.

Discussion

Our study applied machine learning models to predict gout using clinical and genetic data from a large Taiwanese cohort, with notable findings supporting the efficacy of machine learning in identifying individuals at risk of gout. Key insights from our analyses reveal that RF and GB were the top-performing models, achieving high predictive metrics, even when accounting solely for baseline clinical variables. Moreover, we observed a notable gender disparity among gout patients, with a higher prevalence in men (51.56%) than women (49.85%). Earlier research⁷ demonstrated a rising trend in disease prevalence from 1990 to 2019, and their predictive model based on the age-period-cohort (APC) suggested that this upward trajectory is likely to persist.

Our results indicate that gout risk can be robustly predicted using clinical data alone, with the RF model achieving an AUC of 0.986 and GB reaching an AUC of 0.987. These metrics underscore the models’ capability to differentiate gout cases from non-cases with high sensitivity and specificity, supporting their potential use as diagnostic or screening tools in clinical settings. Notably, integrating genetic information (ie, the ABCG2 rs2231142 risk allele) provided a marginal improvement across all model performance indicators. The consistency of high F1 scores (≥0.97) across models highlights a balanced capacity for both sensitivity and precision. This balance is particularly valuable in a clinical environment, where overdiagnosis and missed cases both pose risks. The ability to maintain such predictive power even after adjustments for gender and age through propensity score matching further strengthens the reliability of these models across diverse demographic groups.

In a previous investigation,²⁵ the accuracy of the machine learning method for identifying gout flares was validated, demonstrating enhanced sensitivity and specificity compared to earlier studies. Our study builds upon the growing body of literature utilizing machine learning for disease prediction, specifically in the context of gout. However, to our knowledge, no predictive model has been established for gout, especially in Taiwan. Several studies investigating machine learning models have concentrated on hyperuricemia, with separate observations. For instance, one of them²⁶ found that the decision tree performed better than other models. Notably, XG Boost demonstrated superior performance in some of the studies.^27,28 Conversely, another study reported similar predictive efficacy among the decision tree, random forest, and logistic regression approaches.²⁹

The utilization of machine learning technologies offers the potential to analyze vast datasets, identify intricate patterns, and to tailor interventions, thereby advancing our ability to proactively address the evolving landscape of gout and optimize patient outcomes. In this research, we found that the Random Forest and Gradient Boosting algorithms demonstrated exceptional accuracy in predicting gout status. The robust performance of these algorithms across different models underscores their versatility and reliability in gout risk prediction. Nevertheless, a recent investigation comparing predictive models for tophi in individuals with gout revealed that the logistic regression model outperformed various machine learning models.³⁰ This conclusion was drawn after a comprehensive analysis involving the assessment of AUC, decision curve analysis (DCA), calibration curves, and precision-recall (PR) curves.

The identification of influential factors, including sex, age, BMI, lifestyle factors (smoking, drinking, exercise), and biochemical markers (uric acid, creatinine, HDL-cholesterol, LDL-cholesterol, TG), substantiates the multifactorial nature of gout development. Notably, the integration of the ABCG2 rs2231142 risk locus, a well-established genetic factor in gout pathogenesis, aligns with studies showing its role in elevating serum uric acid levels, thereby exacerbating gout risk. However, while previous studies have focused on traditional statistical methods, our study highlights the added benefit of machine learning models in handling complex, high-dimensional data with higher predictive accuracy.

The predictive accuracy of especially the RF and GB models suggests practical applications in clinical decision-making. These models could serve as non-invasive, early screening tools, especially in populations with high baseline risks (for instance, individuals with elevated BMI or uric acid levels). Furthermore, given the incremental benefit of including genetic data, targeted genetic testing may be beneficial in specific high-risk populations to further refine predictions. Implementing these models in clinical settings could help identify at-risk individuals earlier, potentially prompting lifestyle or pharmacological interventions to prevent or manage gout more effectively.

While this study demonstrates strong predictive performance, several limitations should be considered. First, the reliance on data from a specific population, in this case, Taiwanese biobank participants, may impact the generalizability of our findings. Second, the diagnosis of gout was based on self-report and uric acid levels: we did not utilize crystal confirmation for diagnosis. Additionally, although genetic data provided incremental benefits, further exploration of additional genetic markers beyond ABCG2 rs2231142 might enhance prediction capabilities. Finally, the study’s reliance on propensity score matching, though helpful in controlling confounding, may not fully account for other unmeasured variables that influence gout risk. Future studies could explore machine learning models that incorporate additional environmental or lifestyle factors to capture a more comprehensive risk profile. Despite these, and given the escalating prevalence of gout, it is imperative to embrace innovative approaches, such as machine learning models. Integrating multidisciplinary input and establishing medical record networks,⁷ supported by advanced machine learning algorithms, can significantly enhance the precision and efficiency of gout management. Future studies incorporating diverse cohorts would enhance the external validity of our predictive model.

Conclusions

In conclusion, the application of Random Forest and Gradient Boosting models demonstrates high accuracy in gout prediction, with or without the addition of genetic data. This study underscores the potential of machine learning in clinical prediction, particularly for conditions with complex metabolic and genetic underpinnings like gout. These findings pave the way for personalized risk assessment and preventive interventions, thereby advancing our ability to mitigate the burden of gout on public health.

Abbreviation

RF, Random Forest; LR, Logistic Regression; AUC, area under the curve; DCA, decision curve analysis; PR, precision-recall; HDL, high-density lipoprotein; LDL, low-density lipoprotein; TG, triglycerides; APC, age-period-cohort; BMI, body mass index.

Data Sharing Statement

The data that support the findings of this study are available from Taiwan Biobank but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however available from the corresponding author, Prof. Yung-Po Liaw upon reasonable request and with permission of Taiwan Biobank.

Ethical Statement

The Institutional Review Board of the Chung Shan Medical University Hospital granted ethical approval for this study (IRB:CS1-23101). Participants signed an informed consent letter before enrolling in the Taiwan Biobank project. All methods were carried out following relevant guidelines and regulations.

Acknowledgments

The authors thank the National Science and Technology Council for providing financial support.

Author Contributions

All authors made significant contributions to the work reported, whether in the conception, study design, execution, data acquisition, analysis, interpretation, or in all these areas. They participated in drafting, revising, and critically reviewing the article, provided final approval for the version to be published, agreed on the journal to which the article has been submitted, and accept accountability for all aspects of the work.

Funding

The National Science and Technology Council (NSTC), Taiwan, funded this study (NSTC 112-2121-M-040-002).

Disclosure

The authors report no conflicts of interest in this work.

References

1. Helget LN, England BR, Roul P, et al. Incidence, prevalence, and burden of gout in the veterans health administration. Arthritis Care Res. 2021;73(9):1363–1371. doi:10.1002/acr.24339

2. Kleinstäuber M, Wolf L, Jones AS, Dalbeth N, Petrie KJ. Internalized and anticipated stigmatization in patients with gout. ACR Open Rheumatol. 2020;2(1):11–17. doi:10.1002/acr2.11095

3. Santosa A. An update on the diagnosis and management of gout. Singapore Family Physi. 2017;43(2):11–18. doi:10.33591/sfp.43.2.u2

4. Kiadaliri A, Moreno-Betancur M, Turkiewicz A, Englund M. Educational inequalities in all-cause and cause-specific mortality among people with gout: a register-based matched cohort study in southern Sweden. Int J Equity Health. 2019;18(1):1–10. doi:10.1186/s12939-019-1076-1

5. Kuo C-F, Grainge MJ, Zhang W, Doherty M. Global epidemiology of gout: prevalence, incidence and risk factors. Nat Rev Rheumatol. 2015;11(11):649–662. doi:10.1038/nrrheum.2015.91

6. Kuo CF, Grainge MJ, See LC, et al. Epidemiology and management of gout in Taiwan: a nationwide population study. Arthritis Res Ther. 2015;17(1):13. doi:10.1186/s13075-015-0522-8

7. He Q, Mok T-N, Sin T-H, et al. Global, regional, and national prevalence of gout from 1990 to 2019: age-period-cohort analysis with future burden prediction. JMIR Public Health Surveill. 2023;9(1):e45943. doi:10.2196/45943

8. Chandratre P, Roddy E, Clarson L, Richardson J, Hider SL, Mallen CD. Health-related quality of life in gout: a systematic review. Rheumatology. 2013;52(11):2031–2040. doi:10.1093/rheumatology/ket265

9. Mody GM. Rheumatology in Africa—challenges and opportunities. Arthritis Res Therapy. 2017;19(1):1–3. doi:10.1186/s13075-017-1259-3

10. Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N. Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. 2017;12(4):e0174944. doi:10.1371/journal.pone.0174944

11. Ragab G, Elshahaly M, Bardin T. Gout: an old disease in new perspective–A review. J Adv Res. 2017;8(5):495–511. doi:10.1016/j.jare.2017.04.008

12. Sajeev S, Champion S, Beleigoli A, et al. Predicting Australian adults at high risk of cardiovascular disease mortality using standard risk factors and machine learning. Int J Environ Res Public Health. 2021;18(6):3187. doi:10.3390/ijerph18063187

13. Reddy KVV, Elamvazuthi I, Aziz AA, Paramasivam S, Chua HN, Pranavanand S. Heart disease risk prediction using machine learning classifiers with attribute evaluators. Appl Sci. 2021;11(18):8352. doi:10.3390/app11188352

14. Kavakiotis I, Tsave O, Salifoglou A, Maglaveras N, Vlahavas I, Chouvarda I. Machine learning and data mining methods in diabetes research. Comput Struct Biotechnol J. 2017;15:104–116. doi:10.1016/j.csbj.2016.12.005

15. Jaiswal V, Negi A, Pal T. A review on current advances in machine learning based diabetes prediction. Prim Care Diabetes. 2021;15(3):435–443. doi:10.1016/j.pcd.2021.02.005

16. Lai H, Huang H, Keshavjee K, Guergachi A, Gao X. Predictive models for diabetes mellitus using machine learning techniques. BMC Endocr Disord. 2019;19(1):1–9. doi:10.1186/s12902-019-0436-6

17. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H. Predicting diabetes mellitus with machine learning techniques. Front Genetics. 2018;9:515. doi:10.3389/fgene.2018.00515

18. Ma C, Pan C, Ye Z, Ren H, Huang H, Qu J. Gout staging diagnosis method based on deep reinforcement learning. Processes. 2023;11(8):2450. doi:10.3390/pr11082450

19. Yu KH, Chen DY, Chen JH, et al. Management of gout and hyperuricemia: multidisciplinary consensus in Taiwan. Int J Rheum Dis. 2018;21(4):772–787. doi:10.1111/1756-185X.13266

20. Tsai HH, Tantoh DM, Hsiao CH, Zhong JH, Chen CY, Liaw YP. Risk of gout in Taiwan biobank participants pertaining to their sex and family history of gout among first-degree relatives. Clin Exp Med. 2023;23(8):5315–5325. doi:10.1007/s10238-023-01167-1

21. Fan CT, Lin JC, Lee CH. Taiwan biobank: a project aiming to aid Taiwan’s transition into a biomedical island. Pharmacogenomics. 2008;9(2):235–246. doi:10.2217/14622416.9.2.235

22. Sarker IH. Machine learning: algorithms, real-world applications and research directions. SN Comput Sci. 2021;2(3):160. doi:10.1007/s42979-021-00592-x

23. Shen X, Wang C, Liang N, et al. Serum metabolomics identifies dysregulated pathways and potential metabolic biomarkers for hyperuricemia and gout. Arthritis Rheumatol. 2021;73(9):1738–1748. doi:10.1002/art.41733

24. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–575. doi:10.1086/519795

25. Zheng C, Rashid N, Wu YL, et al. Using natural language processing and machine learning to identify gout flares from electronic clinical notes. Arthritis Care Res. 2014;66(11):1740–1748. doi:10.1002/acr.22324

26. Sampa MB, Hossain MN, Hoque MR, et al. Blood uric acid prediction with machine learning: model development and performance comparison. JMIR Med Inform. 2020;8(10):e18331. doi:10.2196/18331

27. Zheng Z, Si Z, Wang X, et al. Risk prediction for the development of hyperuricemia: model development using an occupational health examination dataset. Int J Environ Res Public Health. 2023;20(4):3411. doi:10.3390/ijerph20043411

28. Chen S, Han W, Kong L, et al. The development and validation of a non-invasive prediction model of hyperuricemia based on modifiable risk factors: baseline findings of a health examination population cohort. Food Funct. 2023;14.

29. Ichikawa D, Saito T, Ujita W, Oyama H. How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach. J biomed informat. 2016;64:20–24. doi:10.1016/j.jbi.2016.09.012

30. Lei T, Guo J, Wang P, et al. Establishment and validation of predictive model of tophus in gout patients. J Clin Med. 2023;12(5):1755. doi:10.3390/jcm12051755

Creative Commons License © 2024 The Author(s). This work is published by Dove Medical Press Limited, and licensed under a Creative Commons Attribution License. The full terms of the License are available at http://creativecommons.org/licenses/by/4.0/. The license permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Download Article [PDF]

Risk Factors for Gout in Taiwan Biobank: A Machine Learning Approach

Introduction

Materials and Methods

Data Source and Disease Identification

Data Processing and Model Development

Statistical Analysis

Results

Discussion

Conclusions

Abbreviation

Data Sharing Statement

Ethical Statement

Acknowledgments

Author Contributions

Funding

Disclosure

References

Recommended articles

Promise and Provisos of Artificial Intelligence and Machine Learning in Healthcare

Undergraduate Medical Students’ and Interns’ Knowledge and Perception of Artificial Intelligence in Medicine

Machine Learning in Cardiology: A Potential Real-World Solution in Low- and Middle-Income Countries

Machine Learning and AI in Cancer Prognosis, Prediction, and Treatment Selection: A Critical Approach

Exploring the Role of Artificial Intelligence in Mental Healthcare: Current Trends and Future Directions – A Narrative Review for a Comprehensive Insight