Machine learning-based 2-year risk prediction tool in immunoglobulin A nephropathy
Article information
Abstract
Background
This study aimed to develop a machine learning-based 2-year risk prediction model for early identification of patients with rapid progressive immunoglobulin A nephropathy (IgAN). We also assessed the model’s performance to predict the long-term kidney-related outcome of patients.
Methods
A retrospective cohort of 1,301 patients with biopsy-proven IgAN from two tertiary hospitals was used to derive and externally validate a random forest-based prediction model predicting primary outcome (30% decline in estimated glomerular filtration rate from baseline or end-stage kidney disease requiring renal replacement therapy) and secondary outcome (improvement of proteinuria) within 2 years after kidney biopsy.
Results
For the 2-year prediction of primary outcomes, precision, recall, area-under-the-curve, precision-recall-curve, F1, and Brier score were 0.259, 0.875, 0.771, 0.242, 0.400, and 0.309, respectively. The values for the secondary outcome were 0.904, 0.971, 0.694, 0.903, 0.955, and 0.113, respectively. From Shapley Additive exPlanations analysis, the most informative feature identifying both outcomes was baseline proteinuria. When Kaplan-Meier analysis for 10-year kidney outcome risk was performed with three groups by predicting probabilities derived from the 2-year primary outcome prediction model (low, moderate, and high), high (hazard ratio [HR], 13.00; 95% confidence interval [CI], 9.52–17.77) and moderate (HR, 12.90; 95% CI, 9.92–16.76) groups showed higher risks compared with the low group. From the 2-year secondary outcome prediction model, low (HR, 1.66; 95% CI, 1.42–1.95) and moderate (HR, 1.42; 95% CI, 0.99–2.03) groups were at greater risk for 10-year prognosis than the high group.
Conclusion
Our machine learning-based 2-year risk prediction models for the progression of IgAN showed reliable performance and effectively predicted long-term kidney outcome.
Introduction
Immunoglobulin A nephropathy (IgAN) is a prevalent primary glomerulonephritis, especially in Asian countries where it comprises up to 50% of cases [1,2]. IgAN is common in young adults, aged 20 to 30 years [3]. After IgAN diagnosis, kidney function deteriorates, and >30% of IgAN patients progress to end-stage kidney disease (ESKD), requiring dialysis in 10 to 25 years [4,5]. Moreover, the incidence of cardiovascular complications and mortality are significantly increased in patients with IgAN compared with individuals of the same age and sex [6,7]. Hence, research has focused on early risk stratification of IgAN disease progression to prevent further development of adverse outcomes. Various factors, including baseline kidney function, proteinuria, blood pressure (BP), and histologic findings, contribute to the pathophysiology of IgAN, disrupting the accurate prediction of the disease prognosis [8]. Growing evidence suggests that modifying these factors early in the disease course may prevent the long-term decline in kidney function [9–11]. However, creating precise long-term prediction tools for IgAN outcomes, especially during the early stages, is difficult due to the disease’s rarity, time lag between diagnosis and outcome development, and infrequent hard outcomes (such as a 50% reduction in estimated glomerular filtration rate (eGFR) or ESKD) [10]. Moreover, based on the disease complexity that multifactorial risks affect the progression of IgAN, an accurate establishment of prediction models for IgAN progression is challenging [12].
Recently, various risk-scoring systems to predict IgAN progression were established using multiple clinical and pathological findings [13,14]. However, these systems have limitations such as insufficient sample sizes, different pathological scoring criteria, and relatively few variables. To overcome the shortcomings of prediction models from conventional statistical methods, machine learning approaches have been applied in models for predicting the progression of IgAN [12,15–17]. Machine learning-based prediction algorithms show better predictive performance, cover larger data sets, and interpret more complex interactions than conventional tools. Nevertheless, the accuracy and practical applicability of these prediction models for determining long-term outcomes in patients with IgAN remain uncertain. In addition, the potential impact of short-term outcomes, especially those occurring within 1 to 2 years after biopsy, on the long-term prognosis of IgAN is unclear due to the lack of tools to predict the short-term outcome.
This study therefore aimed to develop a machine learning-based 2-year risk prediction model for IgAN progression, based on kidney function decline and proteinuria improvement using a database of kidney biopsy-proven IgAN patients. We also validated the usefulness of this prediction model for predicting 10-year long-term kidney outcome.
Methods
Study population and design
This was a retrospective study using databases from two tertiary hospitals (Severance Hospital and Gangnam Severance Hospital at Yonsei University College of Medicine). The overall research workflow is summarized in Fig. 1. A total of 1,864 patients with biopsy-proven IgAN from May 2005 to January 2021 were initially screened. The patients were excluded on the basis of the following criteria: patients aged <18 years, baseline eGFR of <15 mL/min/1.73 m2, underwent dialysis or kidney transplantation, and missing laboratory test results. Finally, a total of 1,301 patients were included for primary outcome analysis. For secondary outcome analysis, as the secondary outcome was defined as improvement in urine protein to creatinine ratio (UPCR) <1.0 g/g Cr accompanied by a 30% decline from baseline, a total of 597 patients were included after excluding individuals with a baseline UPCR of <1.0 g/g Cr or those with missing follow-up UPCR data (Fig. 2).
This study was approved by the Institutional Review Board of Yonsei University Severance Hospital (No. 3-2021-0059). In addition, the study was conducted following the guiding principles of the Declaration of Helsinki. Furthermore, the requirement for written informed consent was waived because of the study’s retrospective nature. Overall, the methods and results followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis guidelines [18].
Data collection
Baseline demographic data, including age, sex, BP, pulse rate, body mass index (BMI), and alcohol or smoking status, were collected. Medical histories such as hypertension or diabetes, use of medications such as angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, statins, steroids, or other immunosuppressants, and initial presenting symptoms such as edema or hematuria were collected. Pathological findings from kidney biopsy were reported using Oxford classification and represented with M, E, S, and T scores (MEST scores) which means mesangial hypercellularity (M), endocapillary hypercellularity (E), segmental glomerulosclerosis (S), and tubular atrophy/interstitial fibrosis (T) [19]. Laboratory data were collected from fasting blood samples. Serum creatinine levels were measured using the rate-blanked compensated Jaffe kinetic method with the Roche reagent Creatinine and the Roche Calibrator for Automated Systems, traceable to an isotope dilution mass spectrometry (IDMS) reference method in the Hitachi Automatic Analyzer (Hitachi). Since April 2011, the Severance Hospital adopted an IDMS traceable method to measure serum creatinine (the value of serum creatinine measured before this date was modified using the following equation: new serum creatinine = 1.049 × previous value – 0.129) [20]. The eGFR was calculated using the CKD-EPI (Chronic Kidney Disease-Epidemiology Collaboration) equation [21]. Urine samples were collected in the morning after the first voiding. Fresh urine samples were analyzed using URISCAN Pro II (YD Diagnostics Corp.). The presence of proteinuria was assessed by the UPCR. Features from the computed tomography scan and sonography, such as kidney size, echogenicity, and the presence of renal stone or cyst, were obtained. For the missing values of each variable, those that had >50% null were excluded. For the remaining variables, multiple imputation by chained equations (MICE) were applied to impute the missing data [22]. Finally, 43 variables were included to predict the study outcome.
Study outcome
The primary outcome of this study was the deterioration of kidney function within 2 years after the kidney biopsy. The deterioration of kidney function was defined as a composite of a 30% decline in eGFR from baseline, the development of ESKD requiring hemodialysis or peritoneal dialysis, or kidney transplantation. The proportion of each outcome definition for the primary outcome is depicted in Supplementary Fig. 1 (available online). The secondary outcome was an improvement of proteinuria among patients whose baseline UPCR was ≥1.0 g/g Cr. The improvement of proteinuria was defined as an improvement in UPCR to <1.0 g/g Cr and a 30% decline in UPCR from baseline within 2 years of follow-up after kidney biopsy.
Two-year prediction model development and performance test
This study developed a two-step 2-year outcome prediction model. Firstly, we developed the prediction model evaluating the risk of a 30% decline in eGFR or development of ESKD (primary outcome), involving a total of 1,301 subjects including those with a baseline UPCR of <1.0 g/g Cr. We then developed a second prediction model for evaluating the improvement of UPCR to <1.0 g/g Cr with a 30% decrease from the baseline (secondary outcome), involving 597 subjects after excluding 691 subjects with a baseline UPCR of <1.0 g/g Cr.
To construct the prediction model, multicenter data were divided into the model derivation and validation cohort according to the patient’s hospital; Severance Hospital data was the model derivation cohort, and Gangnam Severance Hospital data was the validation cohort. Owing to the imbalanced ratio between the positive (event) and negative (no event) records in the derivation cohort that can cause bias during the model development, the synthetic minority over-sampling technique (SMOTE) [23] was applied to balance the derivation cohort for the primary outcome. Among various SMOTE methods, we selected a method that can handle numerical and categorical data, called SMOTE-nominal and continuous (SMOTE-NC) [23]. The final derivation cohort, used as input for the prediction model, has a ratio of 1:2 between cases with composite outcomes and cases without composite outcomes for the primary outcome. We used a random forest-based machine learning algorithm to develop the prediction models. Random forest is an ensemble machine learning algorithm that can handle binary outcomes [24]. The grid search was performed using the derivation cohort with internal five-fold cross-validation for criterion after the random search; the method used to measure the split quality, max_depth; the maximum depth of the tree, max_features; the number of features to split the tree well, and n_estimator; the number of trees to adapt the best combination of parameters for a random forest. The range of each parameter is listed: max_depth from 2 to 10, max_features every 5 from 5 to 30, n_estimators every 10 from 10 to 100, and criterion either gini or entropy. For the primary outcome, 10 for max_depth, 5 for max_features, 90 for n_estimators, and entropy criterion were chosen. For the secondary outcome, 8 for max_depth, 10 for max_features, 10 for n_estimators, and entropy criterion were chosen as a combination of best parameters.
The 2-year prediction model’s performances were evaluated based on the following performance indexes from both cohorts: accuracy, precision, recall, the area under the precision-recall curve (AUPRC), the area under the receiver operating characteristic curve (AUROC), Brier score, and F1-score. The classification threshold for the primary outcome was defined as the point where the recall reached approximately 0.8.
Feature importance analysis
To produce an explainable prediction model, feature importance was calculated using the Shapley Additive exPlanations (SHAP) method. SHAP analysis provides information about which features have a high contribution to predicting the outcome, with each feature value’s contribution to deciding whether the sample is positive or negative by calculating Shapley value [25,26].
Long-term risk stratification
To test the predictive value of our 2-year prediction models on long-term (10-year follow-up) risk of kidney disease progression, the association between the predicted probability of primary and secondary outcomes from the 2-year prediction models and long-term risk of kidney outcome was evaluated. Each predictive probability derived from the 2-year risk prediction tools was categorized into three groups based on predictive probability: less than 0.5 (low), between 0.5 to 0.75 (moderate), and greater than 0.75 (high). Ten-year risk of kidney disease progression was defined as a composite of a 30% decline in eGFR from baseline and the development of ESKD during the follow-up period. Kaplan-Meier analysis was performed to compare the 10-year kidney survival among the three groups based on predictive probability. Statistical comparisons for each group were made by log-rank test, and Cox proportional hazards regression analysis was used to calculate the relative hazard ratio (HR) among the three groups.
Software
Python version 3.7 and R studio version 4.0.3 were used for the data preprocessing and model development. For data imputation, MICE package version 3.13.0 was used. SMOTE-NC package version 0.8 was used for data balancing, and scikit-learn package version 0.43 was used for model development. Lifelines30 package version 0.26 and SHAP package version 0.39 were used to evaluate the model.
Results
Baseline characteristics
The baseline characteristics at the time of biopsy of 1,301 patients included for the development of the primary outcome model are shown in Table 1. In the derivation cohort (n = 1,165), the mean age of patients was 39.8 years and 54.8% were female. The mean age of patients in the validation cohort (n = 136) was 35.8 years and 50.7% were female. For both cohorts, the drug use proportion including for angiotensin-converting enzyme inhibitors, angiotensin receptor blockers, statin, steroids, and other immunosuppressants was similar. In particular, the use of angiotensin receptor blockers was 40.1% and 36.0% in the derivation and validation cohorts, respectively. In the laboratory tests, no significant difference was observed between the derivation and validation cohorts’ mean serum creatinine levels; 1.03 ± 0.53 and 1.07 ± 0.55 mg/dL, respectively (p = 0.447). However, there was a significant difference in UPCR levels between the two cohorts: median of 0.88 g/g Cr (interquartile range [IQR], 0.42–1.72 g/g Cr) and 1.19 g/g Cr (IQR, 0.56–2.16 g/g Cr), respectively (p = 0.005). In the Oxford classification, over 70% of patients had glomerular sclerosis in two cohorts. There were no significant differences in radiologic features between the two cohorts. The baseline characteristics of 597 patients included for the secondary outcome (improvement of proteinuria) analysis are shown in Supplementary Table 1 (available online).
Performance of 2-year risk prediction tools
During a follow-up for the primary outcome (median, 285 days; IQR, 168‒453 days]), 159 events (derivation set, 12.27%; validation set 11.76%) occurred. For the secondary outcome, 474 events (derivation set, 78.53%; validation set, 85.00%) occurred during the follow-up (median, 85 days; IQR, 25‒194 days). Overall performances of the 2-year risk prediction models for both primary and secondary outcomes are summarized in Table 2 and Fig. 3. In the model derivation cohort for the primary outcome, AUROC and AUPRC were 0.993 and 0.991 with precision of 0.999 and recall of 0.986, and the Brier and F1 scores were 0.005 and 0.993. In the validation cohort for the primary outcome, the AUROC and AUPRC were 0.771 and 0.242 with precision of 0.259 and recall of 0.875, and the Brier and F1 scores were 0.309 and 0.400. Regarding the secondary outcome in the derivation cohort, the AUROC and AUPRC were 0.829 and 0.914 with precision of 0.914 and recall of 1.000, and the Brier and F1 scores were 0.074 and 0.955, respectively. In the validation cohort for the secondary outcome, the AUROC and AUPRC were 0.694 and 0.903 with precision of 0.904 and recall of 0.971, and the Brier and F1 scores were 0.113 and 0.955, respectively.
Feature importance in the 2-year risk prediction models
The SHAP method was used to identify the feature importance of our machine learning-based 2-year risk prediction models (Fig. 4). The top features from the SHAP analysis can interpret the outcome risk discrimination from each model. For the 2-year risk of the primary outcome, the most important feature of the prediction model was UPCR. The next most important features were serum albumin, systolic BP (SBP), urine red blood cell (RBC) dysmorphism, and serum immunoglobulin G (IgG). The SHAP force plots in Supplementary Fig. 2 (available online) show the explanation for the prediction of each patient with one of the highest or lowest probabilities of outcome risk determined by our prediction models. For the 2-year risk of the primary outcome prediction model, a large amount of UPCR and proportion of urine dysmorphic RBC and low serum albumin and IgG levels indicated the increased risk of a 2-year kidney outcome. For the 2-year risk of the secondary outcome prediction model, UPCR was the most informative feature of the model. The next most important features were serum C4, serum creatinine, serum C3, IgG, and BMI. Less UPCR and low serum creatinine levels were indicative of an increased potential for proteinuria improvement in the 2-year period.
Long-term risk stratification by 2-year risk prediction tools
Subsequently, we evaluated whether our 2-year risk prediction tools have a predictive value on the 10-year long-term risk of kidney disease progression. Firstly, we evaluated three risk probability groups derived from 2-year primary outcome risk prediction model (n = 1,301) for the long-term risk (Fig. 5A). During the median of approximately 3.5 years (IQR, 1.3‒6.5 years) of follow-up, 357 events (27.4%) occurred. In the Kaplan-Meier analysis, the higher-risk groups showed an increased long-term risk of kidney disease progression compared with the lowest group (HR, 13.00; 95% confidence interval [CI], 9.52‒17.77 in the high group and HR, 12.90; 95% CI, 9.92–16.76 in the moderate group).
Next, we evaluated three risk probability groups derived from the 2-year secondary outcome risk prediction model (n = 597) for the long-term risk (Fig. 5B). During the follow-up (median, about 2.9 years; IQR, 1.0‒5.8 years), 244 events (41.0%) occurred. Among three risk probability groups, groups representing a lower probability to improve proteinuria within 2 years after biopsy showed an increased long-term risk of kidney disease progression compared with the highest group (HR, 1.66; 95% CI, 1.42‒1.95 in the low group and HR, 1.42; 95% CI, 0.99‒2.03 in the moderate group).
Discussion
In this study, we developed a machine learning-based 2-year risk prediction tool for primary (30% decline in eGFR or ESKD) and secondary (improvement of proteinuria) outcomes within 2 years of follow-up after kidney biopsy in patients with IgAN. The highest feature importance by SHAP analysis for the 2-year risk prediction tools, for both primary and secondary outcomes, was the amount of proteinuria at the time of biopsy. However, serum creatinine level at biopsy was ranked as a less contributing variable for feature importance in the 2-year risk prediction tools. Finally, the 2-year risk prediction tools effectively discriminated the risk for the 10-year long-term progression of IgAN using risk groups based on the predictive probability of 2-year risk prediction tools.
Although IgAN shows a slowly progressive nature compared with other glomerular diseases, there is a substantial heterogeneous clinical course in IgAN [27,28]. Furthermore, multiple risk factors are involved in the progression of IgAN, and an individualized estimate of a patient’s risk of disease progression is essential. Risk prediction may help collaborative decision-making regarding treatment strategies. Recently, various risk prediction tools were developed for IgAN [12,16,17,29]. The KDIGO (Kidney Disease Improving Global Outcomes) 2021 guidelines for managing glomerular disease suggested the International IgAN Prediction Tool (IIgAN-PT) as a useful resource to assess the risk of progression and facilitate shared decision-making with patients [30]. The IIgAN-PT was externally validated using models with multiple ethnic cohorts, including over 4,000 participants [31]. However, Barbour et al. [32] showed that the original IIgAN-PT did not predict outcomes as accurately as when used 1 year after biopsy. Although IgAN has a slow progressive nature, treatment guidelines recommend immunosuppressant treatment if there is no improvement in supportive care within a short period after diagnosis [30]. Hence, it is essential to determine how clinical changes within 1 to 2 years after diagnosis affect the long-term risk. Therefore, the IIgAN-PT should be re-evaluated to predict the risk of disease progression after observation and supportive care. Moreover, the model incorporated established risk factors for the progression of IgAN such as age, BP, eGFR, proteinuria, renin-angiotensin system blockade use, immunosuppressant use, and MEST-C score. In addition, this tool provides data on the long-term risk, especially at least 5-year predicted risks without data on short-term clinical courses following the diagnosis of IgAN. On the other hand, in various risk prediction studies, the long-term risk was predicted by accumulating follow-up data for at least 2 years after diagnosis [33,34]. However, the need for prolonged periods of follow-up data is one of the limitations of previous IgAN prediction models, which reduced their clinical utility. Therefore, the present study is novel for examining the effect of predicted probability for event occurrence based on a machine learning approach within 2 years on predicting the long-term risk for IgAN progression. No previous study has evaluated the usefulness of the machine learning-based 2-year risk probability on long-term risk prediction in IgAN.
Early change in proteinuria is a reliable surrogate outcome in IgAN, and a basis for treatment decision-making [8,10]. An individual participant-level meta-analysis performed by Inker et al. [11] provided evidence that an early reduction in proteinuria of 30% from baseline would confer the probabilities of at least 90% treatment benefits on the long-term risk of disease progression in IgAN. In this study, a 50% reduction in proteinuria after 9 months of treatment was associated with a 60% decrease in the risk of composite kidney outcome. Furthermore, reduction in proteinuria accounted for 11% and 29% of the treatment effect from RASB and steroids, respectively. Canney et al. [10] also supported these findings and concluded that a shorter duration needed to achieve proteinuria remission was associated with significant reductions in the risk of disease progression in IgAN. In accordance with previous studies, our 2-year risk prediction tool for the improvement of proteinuria successfully predicted the probability of 10-year long-term kidney outcome risk. This study is the first to develop a machine learning-based short-term risk prediction model for changes in proteinuria.
In the present study, the highest feature importance evaluated by SHAP analysis for the 2-year risk prediction models for both primary and secondary outcomes was UPCR at the time of biopsy. However, serum creatinine level at biopsy was ranked as a less contributing variable for feature importance in both prediction models. Historically, eGFR and proteinuria during diagnosis are the most significant factors in determining long-term prognosis in IgAN. Nevertheless, our findings showed that proteinuria is the most useful for predicting short-term (within 2 years) clinical outcomes. In contrast, serum creatinine level may have a less significant effect. However, since most patients in this study showed normal kidney function at biopsy, the results should be interpreted cautiously for patients with advanced kidney disease during diagnosis. RBC dysmorphism emerged as an important variable in the prediction model for primary outcome. The accompanying dysmorphic RBC may be closely associated with glomerular structural damage and a decrease in eGFR [35–37]. Additionally, previously known risk factors for the progression of IgAN such as SBP and BMI showed high feature importance in the primary outcome prediction model. Furthermore, serum albumin emerged as a high-importance feature in both prediction models in accordance with previous studies that hypoalbuminemia is a significant risk factor for poor kidney outcome [38,39].
Our study has some limitations. First, the prediction model was derived from the retrospective cohort data of patients with IgAN. Furthermore, the data consisted of a single Korean ethnic group; hence, the findings should be generalized with caution. Thus, prospective cohort studies including multi-ethnic participants are needed to evaluate the validity of this 2-year risk prediction model in patients with IgAN. Second, it is unclear whether therapeutic interventions during the follow-up period of IgAN may change the disease course from our prediction model. Third, the importance of pathologic data contributing to the prediction model for IgAN is limited in this study. Only the Oxford classification was considered as pathologic data in this study. Therefore, incorporating various other features such as digital image analysis, immune cell infiltration, vascular features, and other microscopic findings from biopsy specimens may provide further insights for enhancing the prediction model [2]. Future studies are necessary to investigate this further. Lastly, the limited number of outcome events that occurred within 2-years after kidney biopsy hindered the development of prediction models using specific kidney events such as a 30% decline in eGFR or ESKD requiring dialysis. To validate this, further studies with larger cohorts will need to be performed.
In conclusion, our 2-year prediction models for the progression of IgAN based on eGFR decline, development of ESKD, or improvement of proteinuria successfully predicted the 10-year long-term risk of kidney outcome in patients with IgAN. Our 2-year prediction models can be implemented in clinical practice during biopsy and may provide individualized benefits for treatment decision-making in the early course of IgAN.
Supplementary Materials
Supplementary data are available at Kidney Research and Clinical Practice online (https://doi.org/10.23876/j.krcp.23.076).
Notes
Conflicts of interest
All authors have no conflicts of interest to declare.
Funding
This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI22C0452) and supported by a faculty research grant of Yonsei University College of Medicine (6-2022-0118). This study was also supported by a new faculty research seed money grant of Yonsei University College of Medicine for 2021 (2021-32-0051).
Data sharing statement
The clinical data used to develop the model cannot be shared publicly because of the Personal Information Protection Act enforced by the government.
Authors’ contributions
Conceptualization: DY, HCP, DO, BJL, HYC
Data curation, Formal analysis: YK, CMP
Funding acquisition: DY, JHJ
Investigation: YK, JHJ
Resources: DO, BJL, HYC, DY, HCP
Visualization: YK
Writing–original draft: YK, JHJ
Writing–review & editing: DO, BJL, HYC, DY, HCP, YK, JHJ
All authors read and approved the final manuscript.