Machine learning-based utilization of lipid panels in predicting kidney dysfunction
Article information
Abstract
Background
Dyslipidemia is a risk factor for atherosclerotic cardiovascular disease, but its relationship with kidney dysfunction varies depending on the patient and kidney status. Herein, we used a machine learning model to select lipid panels to improve the predictability of kidney dysfunction.
Methods
A total of 9,403 patients whose lipid panels and low-density lipoprotein subfraction scores were measured were enrolled. The primary outcome was kidney outcome, which was defined as the development of end-stage kidney disease or a 50% decline in kidney function. The secondary outcome was a composite outcome, defined as the occurrence of either the kidney outcome or all-cause death. Five machine learning models were utilized to predict 1- and 3-year outcomes. Feature ranking was employed to identify key factors contributing to model performance.
Results
At 3 years, 117 patients (1.2%) experienced kidney outcomes, and 691 (7.4%) experienced composite outcomes. All of the models that used all the features demonstrated high predictive power, with the area under the receiver operating characteristic curve (AUROC) exceeding 0.85. Although the overall performance of the model that selectively uses lipid panels and creatinine was lower than that of models using all features, its predictive ability for kidney outcomes remained compatible, with AUROC values exceeding 0.78. Feature ranking analysis indicated that apolipoprotein A1, apolipoprotein B, and low-density lipoprotein cholesterol were significant contributors to model performance.
Conclusion
Machine learning models incorporating lipid panels successfully predict the risk of kidney dysfunction, which can help clinicians in precisely identifying patients who are at risk of kidney dysfunction through lipid panel assessments.
Introduction
Dyslipidemia is a well-established risk factor for atherosclerotic cardiovascular disease (ASCVD) [1,2], with several studies demonstrating an association between altered levels of serum lipid panels, such as total cholesterol, low-density lipoprotein (LDL) cholesterol, triglyceride (TG), and high-density lipoprotein (HDL) cholesterol, and the risk of ASCVD [3–5]. Among the various forms of LDL cholesterol, small dense particles pose a greater atherogenic risk than large buoyant particles do [6–8]. Patients with chronic kidney disease (CKD) exhibit hypertriglyceridemia and low HDL cholesterol levels [9,10]. However, total and LDL cholesterol levels can vary depending on the underlying cause of CKD. Patients with nephrotic syndrome and diabetic kidney disease commonly experience elevated total and LDL cholesterol levels, whereas those with non-nephrotic CKD may have lower cholesterol levels [11]. Despite fluctuations in total LDL cholesterol levels, these patients generally have a greater proportion of small, dense LDL cholesterol [9,11].
Because ASCVD and CKD share common risk factors, previous studies have explored the associations between various lipid panels and kidney dysfunction. While some studies have demonstrated a significant association [12,13], others have not confirmed this link [14,15]. Establishing the association between lipid panels and kidney dysfunction via conventional analysis is challenging, particularly in patients with advanced CKD, because comorbid conditions in these patients may confound the results [16,17]. The continuous spectrum of lipoproteins, varying in size, composition, and apolipoprotein content, and the intercorrelations between lipid panels further complicate simultaneous analysis. Therefore, a comprehensive approach that includes lipid panels and subfractions, comorbidities, other laboratory findings, and medications is necessary to better understand these complex associations.
Machine learning models have recently been used to predict clinical outcomes by comprehensively considering a large number of features without the issue of intercorrelation [18]. Feature ranking in these models can provide clinical insights by indicating the extent to which a feature contributes to model performance [19]. There is a growing trend of utilizing machine learning models in nephrology, particularly for predicting kidney dysfunction, including conditions such as acute kidney injury and diabetic kidney disease [20–24]. Herein, we addressed whether machine learning models incorporating lipid panels, including both conventional and subfraction parameters, could predict kidney dysfunction. We also applied a machine learning-based feature ranking method to identify lipid panels that were primarily associated with kidney dysfunction.
Methods
Patients and study design
A cohort from a single tertiary referral center (Seoul National University Hospital) was retrospectively evaluated. A total of 14,123 adult patients (aged ≥18 years) who underwent serum LDL subfraction testing along with conventional lipid panels were reviewed between 2009 and 2016. Patients with end-stage kidney disease (ESKD; e.g., hemodialysis, peritoneal dialysis, and kidney transplantation; n = 591) or those who had received organ transplants other than kidneys (n = 15) at the time of enrollment were excluded. Patients with missing values for serum creatinine and lipid parameters were excluded (n = 4,128). Finally, 9,389 patients were included in the analysis (Supplementary Fig. 1, available online). All the patients were randomly assigned to training, validation, or test groups at a ratio of 7:1:2.
Features for model development
Demographic information, such as age, sex, body mass index, and smoking status, was collected at the time of enrollment. Comorbidities, such as diabetes mellitus, hypertension, CKD, and coronary artery disease, were defined on the basis of the International Classification of Diseases, 10th Revision (ICD-10) codes (Supplementary Table 1, available online). The serum LDL subfraction was measured once at enrollment, while other lipid panels (total cholesterol, LDL cholesterol, HDL cholesterol, TG, apolipoprotein A1 [ApoA1], and apolipoprotein B [ApoB]) were repeatedly tested. Additional blood parameters, such as creatinine, white blood cell count, hemoglobin, glucose, and cystatin C, were also collected. The estimated glomerular filtration rate (eGFR) was calculated via the CKD-EPI (Chronic Kidney Disease Epidemiology Collaboration) equation [25]. Prescription information, such as antidyslipidemia drugs (statin, ezetimibe, fibrate, omega-3), antiplatelets (aspirin, clopidogrel, prasugrel, ticagrelor), anticoagulants (warfarin, direct oral anticoagulant, low-molecular-weight heparin), nitrates, oral antidiabetic drugs, insulin, protein pump inhibitors, H2 receptor antagonists, renin-angiotensin system blockers, beta blockers, calcium channel blockers, minoxidil, loop diuretics, thiazide, spironolactone, xanthine oxidase inhibitors, and erythropoietin-stimulating agents, was summarized at the time of enrollment.
Study outcomes
Kidney dysfunction, as the primary outcome, was defined by a >50% decline in eGFR, a doubling of serum creatinine from baseline, or the occurrence of ESKD (i.e., the need for dialysis or transplantation). The secondary outcome was a composite outcome, defined as the occurrence of either kidney dysfunction or all-cause death. Outcome data were collected as of March 31, 2020. Information on ESKD was obtained from the Korean Society of Nephrology database, and mortality data were retrieved from the Korean Statistical Information Service. Considering that the latest enrollment occurred in 2016, a minimum follow-up period of 3 years was available. Therefore, outcomes were assessed at discrete time points of 1, 2, and 3 years after enrollment, with censoring applied at each time point. Both primary and secondary outcomes were predicted based on these intervals.
Statistical analysis and model development
Statistical analyses were performed via R (version 3.6.2, R Foundation for Statistical Computing) and Python (version 3.9.16, Python Software Foundation). The features are expressed as frequencies and percentages for categorical parameters, means ± standard deviations for normally distributed continuous parameters, and medians and interquartile ranges (IQRs) for nonnormally distributed continuous parameters. Baseline characteristics were compared using the chi-square test for categorical parameters, the Student t test for normally distributed continuous parameters, and the Mann-Whitney U test for nonnormally distributed continuous parameters.
Prediction models were developed using a training set with machine learning algorithms, such as logistic regression (LR), random forest (RF), and light gradient boosting (LGBM), as well as deep learning algorithms, such as multilayer perceptron (MLP) and recurrent neural network (RNN). The models were trained using the training set, and the optimal hyperparameters for LR, RF, and LGBM, along with the training epochs for early stopping in MLP and RNN, were determined based on validation set results. A list of hyperparameters for the LR, RF, and LGBM, as well as detailed methods for the MLP and RNN, is provided in the Supplementary Table 2 (available online) and Supplementary Methods (available online). For LR, we performed a grid search over two regularization penalties (L1 and L2) and four inverse regularization strengths (C = 0.01, 0.1, 1, 10); elastic-net regularization was deliberately excluded to avoid an additional tuning parameter and preserve model interpretability. The Python code used for the machine learning analyses has been made publicly available at the following repository (https://github.com/dactylogram/lipid_panel).
Model performance was assessed using the area under the receiver operating characteristic curve (AUROC) and area under the precision–recall curve (AUPRC) in the testing dataset. Feature importance was assessed in two ways: individually for each feature and grouped by category (demographics, comorbidities, creatinine, lipid panels, other laboratory findings, and medications). Lipid panels that ranked highly in feature importance were further validated using a multivariable Cox regression model, and hazard ratios (HRs) and 95% confidence intervals (CIs) were estimated. A two-sided p-value of less than 0.05 was considered statistically significant.
Ethics statement
The study was approved by the International Review Board (IRB) of Seoul National University Hospital (No. H-1807-013-955) and was conducted in accordance with the principles of the Declaration of Helsinki. In terms of ethics approval, the requirement for informed consent was waived by the IRB.
Results
Cohort characteristics
The average age of the patients was 64.8 ± 10.9 years, and 64.7% were male (Table 1). The patients who developed kidney or composite outcomes tended to be older and had a higher prevalence of diabetes mellitus and CKD. The proportion of patients diagnosed with coronary artery disease and the prescription rates of statin, aspirin, and nitrate were notably high (Table 1). This was largely because lipid subfraction scoring was primarily performed in patients with coronary artery disease to assess their prognosis. Serum cystatin C results were available for the majority of patients (92.8%). Diseases with a prevalence rate of less than 10% and drugs with a prescription rate of less than 5% are presented in Supplementary Table 3 (available online). Baseline characteristics of the study subjects by study groups (training, validation, and test sets) are presented in Supplementary Table 4 (available online).
Model performance
The median follow-up duration was 6 years (IQR, 4.3–8.6 years). During this period, kidney dysfunction occurred in 187 patients (2.0%), and the composite outcome was observed in 1,497 patients (15.9%). Within the first year after enrollment, kidney dysfunction occurred in 54 patients (0.6%), followed by 87 patients (0.9%) in the second year, and 117 patients (1.2%) in the third year. A total of 281 patients (3.0%) experienced the composite outcome in the first year, 489 patients (5.2%) in the second year, and 691 patients (7.4%) in the third year.
When the performance of each model was compared in terms of AUROC, the short-term (1-year) predictive power tended to be greater than the long-term (3-year) predictive power for kidney outcomes (Fig. 1, Table 2; Supplementary Table 5, available online). However, for the composite outcome, the short-term and long-term predictive powers were relatively similar (Fig. 2, Table 2; Supplementary Table 5, available online). Additionally, the trend in AUPRC for the composite outcome tended to increase more in long-term predictions than in short-term predictions. The minimum and maximum values of AUPRC with all features ranged from 0.182 to 0.250 at 1 year and from 0.285 to 0.327 at 3 years; when lipid panels and creatinine were used, they ranged from 0.158 to 0.243 at 1 year and from 0.256 to 0.294 at 3 years (Table 2; Supplementary Table 5, available online; Supplementary Figs. 2 and 3, available online).
Area under the receiver operating characteristic curve for predicting kidney dysfunction.
(A–C) Model with all features for 1-year (A), 2-year (B), and 3-year (C) prediction. (D–F) Model with lipid plus serum creatinine for 1-year (D), 2-year (E), and 3-year (F) prediction. The x-axis represents the false positive rate (1 − specificity), and the y-axis represents the true positive rate (sensitivity).
LR, logistic regression; LGBM, light gradient boosting machine; RF, random forest; MLP, multilayer perceptron; RNN, recurrent neural network.
Model performance for predicting kidney and composite outcomes based on 10-fold cross-validation by mean AUROCs and mean AUPRCs
Area under the receiver operating characteristic curve for predicting composite outcomes.
(A–C) Model with all features for 1-year (A), 2-year (B), and 3-year (C) prediction. (D–F) Model with lipid plus serum creatinine for 1-year (D), 2-year (E), and 3-year (F) prediction. The x-axis represents the false positive rate (1 − specificity), and the y-axis represents the true positive rate (sensitivity).
LR, logistic regression; LGBM, light gradient boosting machine; RF, random forest; MLP, multilayer perceptron; RNN, recurrent neural network.
Overall, model performance was better when all features were used than when only lipid panels and creatinine were used. For the 1-year prediction of kidney outcomes, the AUROC ranged from 0.870 to 0.918, and the AUPRC ranged from 0.182 to 0.250 with all features, whereas the use of only lipid panels and creatinine resulted in AUROCs ranging from 0.786 to 0.871 and AUPRCs ranging from 0.158 to 0.243. For the 1-year prediction of composite outcome, the AUROCs ranged from 0.876 to 0.892, and the AUPRCs ranged from 0.283 to 0.322 with all features, whereas using lipid panels and creatinine yielded AUROCs of 0.703–0.748 and AUPRCs of 0.136–0.185 (Table 2). However, when predicting kidney outcomes using the LR model, the performance was comparable when all features were used and when only lipid panels and creatinine were used. Detailed performance metrics of the models, including accuracy, sensitivity, specificity, precision, and F1 score, are presented in Supplementary Table 6 (available online) for kidney outcomes and Supplementary Table 7 (available online) for composite outcomes.
Identifying important lipid panels
To identify features that significantly contributed to predicting kidney and composite outcomes, with a particular focus on lipid panels, feature ranking was performed. Among the five algorithms, the MLP demonstrated the most favorable blend of AUPRC, precision, and F1-score and was therefore selected for a detailed feature-importance analysis (Supplementary Tables 6 and 7, available online). When examined in a grouped manner, the lipid panel emerged as the most important feature in the MLP model for predicting 3-year kidney and composite outcomes, highlighting its significance (Figs. 3 and 4). In the LGBM model, it consistently ranked as the second most important feature across all prediction tasks (Supplementary Figs. 4 and 5, available online). Similarly, in the RF model, the lipid panel ranked second to third in importance, reaffirming its relevance (Supplementary Figs. 6 and 7, available online). Although the RNN model showed relatively lower rankings for composite outcome prediction, the third to fourth, the actual SHAP (SHapley Additive exPlanations) values were comparable to those of the top-ranked features (Supplementary Figs. 8 and 9, available online).
Feature importance for predicting kidney dysfunction in the multilayer perceptron model.
(A–C) Importance of each feature for 1-year (A), 2-year (B), and 3-year (C) prediction. (D–F) Importance of grouped features for 1-year (D), 2-year (E), and 3-year (F) prediction. The x-axis indicates the mean absolute SHAP (SHapley Additive exPlanations) value, representing the average impact of each feature on the model output, and the y-axis shows the input features ranked by importance.
AFL, atrial flutter; Afib, atrial fibrillation; ApoB, apolipoprotein B; AST, aspartate aminotransferase; BMI, body mass index; CAD, coronary artery disease; CCB, calcium channel blocker; CKD, chronic kidney disease; CLD, chronic liver disease; DM, diabetes mellitus; eGFR, estimated glomerular filtration rate; IDL, intermediate-density lipoprotein; LDL, low-density lipoprotein; ODA, oral anti-diabetic agent; PT, prothrombin time; PVD, peripheral vascular disease; RAS, renin-angiotensin system; VLDL, very low-density lipoprotein; WBC, white blood cell.
Feature importance for predicting composite outcomes in the multilayer perceptron model.
(A–C) Importance of each feature for 1-year (A), 2-year (B), and 3-year (C) prediction. (D–F) Importance of grouped features for 1-year (D), 2-year (E), and 3-year (F) prediction. The x-axis indicates the mean absolute SHAP (SHapley Additive exPlanations) value, representing the average impact of each feature on the model output, and the y-axis shows the input features ranked by importance.
Apo A, apolipoprotein A; AST, aspartate aminotransferase; BMI, body mass index; CKD, chronic kidney disease; dz, disease; eGFR, estimated glomerular filtration rate; HTN, hypertension; IDL, intermediate-density lipoprotein; LDL, low-density lipoprotein; WBC, white blood cell.
When the influence of individual features was separately estimated to identify the top 20 features, more than one lipid parameter appeared in all of the models. Notably, in the MLP model, where the lipid parameter group had the greatest influence, LDL and ApoB were frequently identified. For kidney outcomes, the top lipid parameters included LDL1 and ApoB at 1 year; intermediate-density lipoprotein (IDL)-B, LDL, and LDL7 at 2 years; and IDL cholesterol, ApoB, IDL-B, and LDL at 3 years. For composite outcomes, the top lipid parameters were TG, IDL-A, and LDL1 at 1 year; LDL and LDL2 at 2 years; and ApoA1 and ApoB at 3 years (Figs. 3 and 4). ApoA1, ApoB, and LDL cholesterol were consistently identified as important lipid parameters, as they frequently appeared in the feature rankings across all machine learning and deep learning algorithms (Figs. 3 and 4; Supplementary Figs. 4–9, available online).
Validation of feature importance
A multivariate Cox analysis was conducted to validate three lipid parameters identified by the algorithms (ApoA1, ApoB, and LDL cholesterol). When analyzed as continuous variables in the fully adjusted model, high ApoB levels were associated with an elevated risk of kidney outcomes (HR, 1.01; 95% CI, 1.00–1.02; p = 0.004) (Table 3), whereas high ApoA1 levels were associated with a low risk of composite outcomes (HR, 1.00; 95% CI, 0.99–1.00; p = 0.02) (Table 4). However, LDL cholesterol did not show a significant linear relationship with either kidney or composite outcomes.
Discussion
Owing to the fluctuating correlation between lipid parameters and various underlying conditions that influence dyslipidemia in CKD patients, the relationship between lipid parameters and kidney dysfunction remains controversial. To address the challenge of intercorrelations, we employed various artificial intelligence models to explore the associations between lipid parameters and kidney outcomes. These models incorporated all relevant features, with a specific focus on lipid parameters and serum creatinine levels, achieving high predictive power for both kidney and composite outcomes, particularly with the MLP and LGBM models. The feature ranking results consistently placed the group of lipid parameters among the top or second most important features in predicting both kidney and composite outcomes. These results demonstrate that machine learning models can accurately predict the associations between kidney dysfunction-related outcomes and changes in lipid panels measured in CKD patients.
Previous studies have utilized various conventional analytical methods, such as analyzing the distribution [26] or ratio [27,28] of lipid parameters and apolipoproteins [29–32], to address the intercorrelations and explore the relationships between dyslipidemia and kidney outcomes. Despite these efforts, no specific lipid factor has been definitively identified as being strongly associated with the progression of kidney disease. Additionally, a Cochrane review reported that while statin usage in CKD patients consistently lowers ASCVD risk, its effect on kidney function remains uncertain [15]. The ongoing debate in studies using traditional analyses highlights the need for new and more advanced analytical approaches to better understand the complex relationship between lipid profiles and kidney dysfunction. To address this challenge, this study utilized machine learning models, which were able to effectively explore the association with kidney dysfunction.
Traditionally, machine learning algorithms have been used to predict short-term outcomes, such as the occurrence of in-hospital acute kidney injury [21] or mortality in intensive care units [22]. However, recent efforts have expanded the application of machine learning algorithms to long-term outcome prediction in conditions such as the development of diabetic kidney disease [23,24,33], and microvascular complications in diabetic patients [34]. A previous study that compared various machine learning models for predicting 3-year outcomes revealed that LGBM outperformed other models, demonstrating promising predictive power for diabetic kidney disease progression [33]. This study also identified several key factors contributing to the prediction, including old age, high homocysteine levels, poor glycemic control, hypoalbuminemia, low eGFR, and high bicarbonate levels [33]. Additionally, a deep learning model that simultaneously incorporated structural data (serum/urine laboratory findings and medications) and unstructured data (ICD-10 codes and medical text) successfully predicted the progression of diabetic kidney disease 6 months in advance, achieving an accuracy of 71% [24].
To our knowledge, while several studies have explored the relationship between lipid parameters and kidney outcomes, our study is the first to investigate this relationship using machine learning algorithms. Our comprehensive model, which incorporated all relevant features, demonstrated strong predictive power for both kidney and composite outcomes, with AUROC values of 0.92 and 0.87, respectively. Notably, even when limited to lipid parameters and serum creatinine, the model maintained significant predictive accuracy, achieving AUROC values of 0.87 for kidney outcomes and 0.72 for composite outcomes. Among the lipid parameters identified through feature ranking, ApoA1 and ApoB consistently showed linear associations with the study outcomes. However, although LDL cholesterol was highlighted in the feature ranking and is known for its variability in CKD patients, a linear relationship was not observed in conventional analysis. This suggests that the machine learning model may have uncovered associations that traditional analysis methods could not detect, highlighting the potential for applying advanced explainable artificial intelligence techniques in future research.
However, our study has certain limitations. Although the overall cohort size was large, the proportion of outcome events was relatively low (kidney outcomes: 0.6%–1.2% and composite outcomes: 3.0%–7.4% across 1 to 3 years). Despite our efforts, this introduces a potential risk of overfitting, highlighting the need for future studies with larger numbers of outcome events to ensure the validity and generalizability of our findings. Second, external validation could not be performed due to the lack of an independent cohort. We attempted to mitigate this limitation by performing validation through conventional survival analysis. Considering the limitations of artificial intelligence in long-term outcome prediction, particularly its inability to account for time lags, this approach was appropriate. Additionally, we were unable to account for the time-varying characteristics of lipid parameters, as it was challenging to assess follow-up results for various lipid parameters simultaneously due to high rates of missing data.
In conclusion, this study demonstrates the utility of lipid parameters for predicting kidney and composite outcomes via diverse machine learning models. Notably, our models identified ApoB, ApoA1, and LDL cholesterol as potential prognostic factors, and their linear relationships were further supported by survival analysis. However, external validation of our machine learning models is essential to establish their robustness and generalizability.
Supplementary Materials
Supplementary data are available at Kidney Research and Clinical Practice online (https://doi.org/10.23876/j.krcp.25.052).
Notes
Conflicts of interest
All authors have no conflicts of interest to declare.
Data sharing statement
The data presented in this study are available from the corresponding author upon reasonable request.
Authors’ contributions
Conceptualization, Supervision: SY, SSH
Data curation: SK, SP, YCK, DKK, KHO, KWJ, YSK
Formal analysis: DY, CP
Investigation: SP, YCK, DKK, KHO
Methodology, Visualization: DY
Project administration: SY, SSH
Validation: SK
Writing–original draft: SK, DY
Writing–review & editing: All authors
All authors read and approved the final manuscript.
