Cardiovascular diseases (CVDs) are one of the leading causes of death, nd predictors such as lipoprotein(a) [Lp(a)] and standard risk scores often may not be sufficient. A recent study explored how automated machine learning (AutoML) can improve CVD risk prediction by using large clinical datasets and developing customized models without requiring advanced data science expertise.
The researchers utilized two major datasets, Ludwigshafen Risk and Cardiovascular Health (LURIC) (n=3,316) and University Medical Center Mannheim (UMC/M) (n=423), to develop and validate AutoML models.
In the first phase, key factors of CVD, such as age, Lp(a), troponin T, body mass index (BMI), and cholesterol, were identified, with model accuracy ranging from an area under the curve (AUC) of 0.62 to 0.91. Phase 2 tested the models using the UMC/M dataset and demonstrated strong accuracy (AUC ranging from 0.72 to 0.84). The SHAP analysis identified key contributing factors, including statin use, age, and N-terminal pro-B-type natriuretic peptide (NT-proBNP) levels.
In the third phase, models predicting cardiovascular death had good performance (AUC 0.74–0.85) but also revealed evidence of data drift, emphasizing the importance of continuous model recalibration.
Reference:
Bibi, I., Schaffert, D., Blanke, P., Illian, L., Lenzing, F., Martin, N., Leipe, J., März, W., Stach, K., & Olsavszky, V. (2025). Cardiovascular risk assessment enhanced by automated machine learning in a multi-phase study. Scientific Reports, 15(1), 1-18. https://doi.org/10.1038/s41598-025-24189-z
https://www.nature.com/articles/s41598-025-24189-z#citeas
Please login to comment on this article