Kidney Res Clin Pract > Epub ahead of print
Yun, Yi, Han, Seong, Menadjiev, Han, Choi, Kim, and Kim: Validation of an acute kidney injury prediction model as a clinical decision support system

Abstract

Background

Acute kidney injury (AKI) is a critical clinical condition that requires immediate intervention. We developed an artificial intelligence (AI) model called PRIME Solution to predict AKI and evaluated its ability to enhance clinicians’ predictions.

Methods

The PRIME Solution was developed using convolutional neural networks with residual blocks on 183,221 inpatient admissions from a tertiary hospital (2013−2017) and externally validated with 4,501 admissions at another tertiary hospital (2020−2021). To assess its application, we conducted a prospective evaluation using retrospectively collected data from 100 patients at the latter hospital, including 15 AKI cases. AKI prediction performance was compared among specialists, physicians, and medical students, both with and without AI assistance.

Results

Without assistance, specialists demonstrated the highest accuracy (0.797), followed by medical students (0.619) and the PRIME Solution (0.568). AI assistance improved overall recall (61.0% to 74.0%) and F1 scores (38.7% to 42.0%), while reducing average review time (73.8 to 65.4 seconds, p < 0.001). However, the impact varied across expertise levels. Specialists showed the greatest improvement (recall, 32.1% to 64.3%; F1, 36.4% to 48.6%), whereas medical students’ performance improved but aligned more closely with the AI model. Additionally, the effect of AI assistance varied by prediction outcome, showing greater improvement in recall for cases predicted as AKI, and better precision, F1 score, and review time reduction (73.4 to 62.1 seconds, p < 0.001) for cases predicted as non-AKI.

Conclusion

AKI predictions were enhanced by AI assistance, but the improvements varied according to the expertise of the user.

Introduction

Acute kidney injury (AKI) is a prevalent and serious clinical condition that requires immediate management [1]. AKI commonly occurs in hospitalized patients, with a prevalence ranging from 6% to 18%, and its incidence tends to increase gradually over time during hospitalization [2,3]. Despite numerous studies on early AKI detection, proactive management strategies remain uncommon [2]. The integration of artificial intelligence (AI) into clinical decision support (CDS) systems has emerged as a promising approach to address this gap [35]. AI has the potential to enhance predictive performance by identifying patients at higher risk of developing diseases and clinical deterioration and who would benefit from specific management strategies [3]. AI-based CDS systems leverage vast clinical data to provide real-time insights and recommendations that can significantly improve diagnostic accuracy and patient outcomes [6]. Furthermore, AI can help reduce the cognitive load on clinicians by automating routine tasks and highlighting critical information, allowing healthcare professionals to focus on complex decision-making processes [7].
Building upon our previous work where we developed an AI prediction model for AKI [8], we developed the PRIME Solution (PRedIction and Management of acute kidney injury with Explainable AI). This model not only predicts the occurrence of AKI but also uses layer-wise relevance propagation (LRP) [9] among explainable AI methods, to highlight the most critical factors influencing the predictions. This approach allows physicians to understand the rationale behind the predictions and gain insights into necessary corrective actions.
However, AI-based CDS systems can sometimes generate inaccurate predictions or poorly tailored suggestions, potentially leading to clinician distrust and reduced efficacy [3,10]. We hypothesized that an explainable AI model like PRIME Solution could improve predictive performance and clinician acceptance by providing transparency into its decision-making process. To test this hypothesis, we designed a study to assess the impact of PRIME Solution on AKI predictions made by healthcare professionals of varying expertise levels.
We conducted this study with two primary objectives: first, to compare the predictive performance of our PRIME Solution with that of physicians and medical students; and second, to evaluate how AI assistance influences the prediction capabilities of these healthcare professionals. By comparing their performance with and without AI assistance, we aimed to assess the efficacy and value of our AI model in enhancing human clinical judgment in AKI prediction.

Methods

Study design

This single-center study was conducted at a tertiary hospital in South Korea, Seoul National University Bundang Hospital (SNUBH), from April 2023 to February 2024 and involved a prospective evaluation using patient data collected retrospectively. The study comprised two main phases, with a preliminary phase of AI model development.
Preliminary phase: Development of the AI model (PRIME Solution). Convolutional neural networks (CNNs) with residual blocks were designed to predict AKI in hospitalized patients. The model was developed using data from a tertiary hospital, Seoul National University Hospital (SNUH), and externally validated using data from SNUBH.
Main evaluation phases:
1. AKI prediction without AI assistance (SET1). Clinical evaluators assessed the risk of AKI for 100 patients selected from the SNUBH dataset without AI assistance.
2. AKI prediction with AI assistance (SET2). Evaluators used the PRIME Solution’s predictions, including interpretative analyses of risk factors derived from the model’s LRP outputs, to assess the same selected patients as in SET1.
We assessed the impact of PRIME Solution on clinical decisions by comparing evaluations performed with and without AI assistance.The Institutional Review Boards (IRBs) of SNHBH and SNUH approved the model development (IRB no. B-1811-502-004, J-1903-090-1019), external validation (IRB no. B-2205-757-305), and evaluation (IRB no. B-2304-825-304) phases of this study. This study adhered to the 1975 Declaration of Helsinki. The requirement for informed consent from patients was waived because of the retrospective nature of the data and minimal risk posed by the study. Written informed consent was obtained from all participating evaluators.

Data collection

The development dataset comprised 183,221 inpatient admissions from SNUBH between 2013 and 2017. The dataset was collected from electronic health records and included data regarding various clinical parameters, laboratory results, and patient demographics. We split these admissions into 70% for training, 15% for validation, and 15% for testing. An additional 4,501 admissions from SNUBH were used for external validation. Both datasets included adult patients hospitalized for at least 3 days, without prior dialysis, baseline creatinine level less than 4.0 mg/dL, and baseline estimated glomerular filtration rate of 15 mL/min/1.73 m2 or higher.

Model development

We utilized CNNs with residual blocks for AKI prediction, focusing on predicting the onset of AKI. CNNs were chosen because of their proven effectiveness in pattern recognition and classification tasks, particularly in time-series data relevant to AKI prediction [11,12]. Specifically, ResNet was selected owing to its ability to learn detailed data patterns [13]. The model’s features, detailed data preprocessing workflow, architecture, and receiver-operating characteristic (ROC) curves are provided in Supplementary Table 1 (available online) and Supplementary Figs. 13 (available online), respectively.
To determine the optimal threshold for the PRIME Solution, we tested various thresholds for both our original test set and the external validation set. Based on these tests, we set the prediction threshold to 0.9 for this study (Supplementary Table 2, available online). To enhance model interpretability, we used LRP to determine the contribution of each feature to the predictions of the model (Supplementary Fig. 4, available online).

Data preparation and evaluation

For the evaluation phases (SET1 and SET2), we used the data of 100 patients admitted to SNUBH between 2020 and 2021. Twenty patients each were selected from the geriatrics, urology, nephrology, surgery, and orthopedics departments, reflecting the proportion of AKI occurrences across these specialties. The baseline characteristics of these patients are presented in Supplementary Table 3 (available online). Prediction time points were selected as evenly as possible from days 1 to 7 of hospitalization, with an emphasis on the initial days to account for shorter hospitalizations. Only data before AKI onset were included because the goal of the model was to predict the initial occurrence of AKI. Of the 100 patients in the study, 15 developed AKI. During model optimization, 26 patients were excluded due to a lack of creatinine data or no change over the course of 48 hours. The remaining 74 patients, including 14 AKI occurrences, were used for most evaluations; however, the data of all 100 patients were used when comparing the performance based on the predictions of the PRIME Solution.
This study involved the following three groups of 11 evaluators: specialists (one board-certified nephrology subspecialist with 18 years of nephrology training and one internal medicine specialist with 2 years of nephrology training); physicians (one internal medicine specialist without nephrology training and three internal medicine trainees); and medical students (five students from different years of medical school).
Two-stage tests were conducted. For SET1, the evaluators independently reviewed the digital format of the patient data to predict AKI occurrence within 48 hours. After a washout period of 1 to 2 weeks, SET2 was conducted similarly, but with PRIME Solution predictions, including predictions of the AKI occurrence, top 10 rationales derived from static data, and top 10 rationales derived from dynamic data. The evaluators used a custom evaluation platform with dedicated buttons to record the start and end times for each patient assessment. The platform automatically calculated the evaluation duration for each patient. The evaluators predicted AKI occurrence, selected up to 10 influential variables, and chose appropriate interventions for AKI management. These interventions were categorized into the following six main groups: patient assessment; medication review; imaging studies; hemodynamic stability monitoring; additional tests; and nephrology consultation. Each of these six groups had specific sub-actions (Supplementary Table 4, available online). The aim was to determine whether the predictions of the PRIME Solution influenced the behaviors of the evaluators. Two specialists finalized the key factors influencing each case of AKI. These key factors were used to evaluate how well the model and the evaluators selected the reasons for predicting the AKI occurrence. The match rate was calculated based on the number of variables chosen by the evaluators that matched the finalized key factors: Match rate = (number of correctly identified key factors) / (total number of key factors defined by the specialists).

Statistical analysis

We investigated the impact of AI assistance on diagnostic accuracy and evaluated the usefulness of PRIME Solution for AKI prediction. Performance was measured using various metrics, including accuracy, precision, recall, F1 score, specificity, negative predictive value, false-positive rate, Matthews correlation coefficient, and threat score [14]. Detailed definitions and equations for these metrics are provided in Supplementary Table 5 (available online).
Comparisons within SET1 and SET2 among evaluator groups were performed using one-way analysis of variance. Paired t tests were performed to compare changes between SET1 and SET2. For analyses of prediction duration and match rates, we utilized all available data points for comparisons between SET1 and SET2 for each evaluator group. Behavioral changes were analyzed by comparing the number of selected actions for each behavior type between SET1 and SET2. All statistical analyses were conducted using R software (version 4.3.2, R Foundation for Statistical Computing) and Python (version 3.8.16). The p-value of <0.05 was considered statistically significant.

Results

Comparison of the performance between evaluators and artificial intelligence model (SET1)

The SET1 scenario revealed distinct performance patterns among groups (Table 1, Fig. 1; Supplementary Table 5, available online). Specialists demonstrated the highest accuracy (79.7%) and precision (46.4%) when predicting AKI without AI assistance. The PRIME Solution excelled in recall (78.6%) but had the lowest precision (27.5%), indicating a high rate of both potential AKI case identification and false positives. F1 scores were comparable across all groups (36.4%−40.4%, p = 0.70), suggesting a similar balance of precision and recall despite varying individual metrics. Physicians exhibited the second-highest recall (76.8%) after the AI model, while maintaining moderate precision. Medical students performed intermediate, with accuracy (61.9%) falling between that of specialists and physicians.

Comparison of diagnostic metrics according to artificial intelligence predictions (SET2)

Impact of assistance provided by PRIME Solution on different skill levels

AI assistance in SET2 significantly improved overall recall across all groups (from 61.0% to 74.0%, p = 0.045) (Table 1, Fig. 2), with specialists demonstrating the most substantial increase (from 32.1% to 64.3%). F1 scores also improved (from 38.7% to 42.0%, p = 0.28), particularly for specialists (from 36.4% to 48.6%). However, specificity decreased across all groups (from 63.3% to 55.2%, p = 0.002), accompanied by increases in false-positive rates (from 36.7% to 44.8%, p = 0.002). Fig. 2 reveals individual variations within groups: specialists consistently demonstrated improved recall and F1 scores but experienced a slight decrease in accuracy; physicians showed variable precision and recall; and medical students exhibited overall enhancements but did not reach specialist levels of accuracy and precision. These results indicate that AI assistance had varying impacts across different expertise levels, with the most pronounced effects observed in the specialist group.

Changes in the performance according to PRIME Solution predictions

We examined how the predictions of the PRIME Solution affected the evaluators’ performance. When PRIME Solution predicted AKI occurrence, the recall increased significantly, especially among the specialists (from 36.4% to 77.3%). Regarding the predictions of nonoccurrence, the precision and F1 scores improved, notably among specialists (precision, from 20% to 33.3%) and medical students (F1 score, from 21.1% to 40%) (Supplementary Table 6, available online).

Changes in the prediction duration with artificial intelligence assistance

AI assistance significantly reduced the decision-making duration from 73.8 to 65.4 seconds (p < 0.001). This reduction was more pronounced in cases where AI predicted that AKI would not occur (73.4 to 62.1 seconds; p < 0.001) compared to those where it predicted that AKI would occur (74.2 to 68.3 seconds; p = 0.04) (Fig. 3; Supplementary Table 7, available online). These results indicated that PRIME Solution improved both the diagnostic metrics and evaluation speed.

Behavioral changes in response to artificial intelligence predictions

The PRIME Solution influenced the selection of clinical actions across all evaluator groups (Fig. 4; Supplementary Table 8, available online). When the AI predicted AKI occurrence, participants tended to choose more actions, particularly fluid-related evaluations and additional testing. This increase was most notable among specialists and medical students. In contrast, predictions of nonoccurrence resulted in fewer changes in clinical action selections.

Model explanation match rate

The match rates of key predictive variables identified by the PRIME Solution and those selected by the evaluators were compared (Fig. 5). The PRIME Solution, which selected 20 variables, showed a higher match rate because of its broader selection set. The evaluators, who were limited to choosing up to 10 variables, had slightly improved match rates with AI assistance (SET2) compared to those without AI assistance (SET1). For specialists, physicians, and medical students, the match rates with AI assistance increased; however, these improvements were not statistically significant (p = 0.40, p = 0.35, and p = 0.06, respectively).

Discussion

This study demonstrated the capacity of PRIME Solution to enhance AKI prediction by integrating AI into CDS systems, benefiting clinicians with various expertise levels. Prior to the integration of AI technologies, AKI risk prediction was based on statistical methods and baseline patient data collected before clinical events or interventions. Recent advancements in AI have revolutionized this approach by incorporating not only baseline data but also dynamic real-time data collected during hospitalization, thereby significantly improving predictive capabilities [15]. Unlike previous studies that used traditional statistical models [16,17], our approach leveraged the ResNet architecture, which is specifically tailored to capture temporal patterns in time-series data. This advanced CNN model effectively modeled the evolving nature of patient data collected during hospitalization, thereby improving the performance of AKI risk prediction. However, the inherent complexity of deep learning models, often referred to as the “black box” problem, limits transparency in the decision-making processes [18]. To address this, LRP was utilized and offered interpretable insights regarding the predictions of the model by identifying the most influential variables. This not only mitigated the “black box” issue but also increased user trust and supported more informed clinical decisions.
Although numerous studies have developed AKI prediction models, evaluations in the context of physician decision-making are scarce. One study directly compared the performance of an AKI prediction model with that of physicians at the time of admission to the intensive care unit and reported areas under the ROC curve of 0.80 for physicians and 0.75 for the model [19]. Our study further compared diagnostic metrics for AKI prediction with and without AI support among specialists, physicians, and medical students. This comprehensive investigation illustrated that AI enhanced decision-making in clinical settings, significantly improving diagnostic performance and efficiency among those with various levels of medical expertise.
The PRIME Solution was designed with a strong emphasis on detecting the onset of AKI, a critical factor for early intervention. This focus resulted in high recall (78.6%), effectively identifying patients who actually developed AKI, but the lower precision (27.5%) indicated a higher occurrence of false alarms. This trade-off demonstrates the ability of the model to detect potential AKI cases while highlighting the need for refinement to reduce unnecessary alerts without compromising sensitivity.
Integrating the PRIME Solution into AKI prediction significantly improved prediction performance, especially recall across all groups. Specialists and medical students demonstrated improvements in F1 scores, indicating a better balance between precision and recall. However, these improvements came with decreased specificity, as the model prioritized identifying true positives even at the risk of increasing false positives. This approach was used to ensure that potential AKI cases were not missed, enabling early intervention. Interestingly, AI assistance had varying impacts depending on the evaluator’s expertise level. Specialists showed a substantial improvement in recall and F1 scores while maintaining higher accuracy than the AI model, indicating that they effectively integrated AI support with their clinical judgment. Medical students also demonstrated substantial improvements; however, their performance was more similar to that of the AI model, suggesting that they relied more heavily on the AI’s recommendations without critically evaluating them to the same extent as more experienced clinicians. Physicians exhibited the most individual variability: the performance of some improved with AI assistance, whereas that of others declined. These findings align with research on human-AI collaboration, revealing that the benefits of AI support depend on users’ expertise [20]. Experts may process AI explanations with less cognitive load by drawing on their subconscious domain knowledge, potentially allowing them to better assess AI predictions’ uncertainty and accuracy. Conversely, those with less expertise might struggle to extract meaningful insights from AI explanations [21].
The PRIME Solution’s prediction outcomes significantly influenced diagnostic performance and clinical behavior. When AI predicted AKI occurrence, recall markedly increased, especially among specialists (from 36.4% to 77.3%), suggesting that AI predictions can effectively guide clinicians in identifying potential AKI cases, thereby improving early interventions. Conversely, precision and F1 scores improved for non-AKI predictions, highlighting AI’s role in supporting more accurate negative diagnoses and its potential use as a screening tool to exclude low-risk patients. Additionally, PRIME Solution led to increased clinical actions when predicting AKI occurrence, particularly among specialists and medical students. This increase in clinical action selections is likely to translate into better patient outcomes, such as higher recovery rates and reduced AKI severity. This proactive approach allows for timely interventions, which are critical when managing AKI. Future studies should focus on quantifying these impacts using specific clinical metrics [22,23]. It is important to note that the selective improvement in performance, varying with both the presence and content of AI predictions, strengthens the argument that PRIME Solution genuinely enhanced clinical decision-making. Although the sample size was limited, the observed decline in diagnostic performance when AI predictions were not provided suggests that the results of this study were not merely a consequence of learning effects associated with repeated evaluations.
PRIME Solution demonstrated practical benefits in clinical settings, such as decreased average review duration, with the most significant time savings observed in non-AKI cases (73.4 to 62.1 seconds, p < 0.001) compared to cases where AKI was predicted (74.2 to 68.3 seconds, p = 0.04). This efficiency gain aligns with previous studies [24,25] and underscores the practical benefits of AI assistance, allowing clinicians to allocate more time to complex cases and other critical tasks. The reduction in review time with AI assistance is particularly valuable in urgent clinical scenarios in which quick and accurate decision-making is necessary. This efficiency can enhance overall clinical workflow, particularly in time-sensitive situations. By streamlining the diagnostic process, AI-CDS systems can potentially alleviate clinical load, enabling faster decisions without compromising prediction accuracy [22,26]. Furthermore, the match rates of key predictive variables showed a slight improvement with AI assistance, although the difference was not statistically significant (specialists, p = 0.40; physicians, p = 0.35; medical students, p = 0.06). This suggests that AI can help clinicians select more relevant variables, potentially leading to more informed and accurate assessments. However, the methods of presenting AI insights, such as specific interface designs or visualization techniques, need further refinement to enhance this aspect more effectively.
This study has limitations inherent to its design and methodology. The single-center nature raises questions about the generalizability of the findings. Differences in patient demographics, clinical practices, and healthcare infrastructures across institutions may lead to variable outcomes when implementing the PRIME Solution elsewhere. Additionally, the study’s reliance on a relatively small sample of 100 patients and a limited number of evaluators may not provide a comprehensive overview of the effectiveness across broader clinical settings. Variability in baseline AKI assessment accuracy among evaluators with different expertise levels introduces another layer of complexity that potentially influences the perceived impact of the PRIME Solution. Moreover, the study did not fully explore the dynamic nature of clinical environments, where patient conditions and clinical decision-making factors can evolve rapidly, limiting the applicability to real-world settings. Finally, our AI model’s focus on high sensitivity led to decreased specificity, resulting in more false positives. While high recall is critical for early detection and prevention of AKI, this approach could increase clinician workload and potentially lead to alert fatigue in practice, compromising the model’s effectiveness.
In conclusion, our study illustrated the promising role of AI in improving the prediction of AKI, particularly through enhancements in recall and efficiency. Nonetheless, integrating AI into clinical practice must be approached with caution, ensuring that such systems augment clinicians’ judgment without undermining it. Future research should focus on refining AI models to achieve an optimal balance between sensitivity and specificity, explore the psychological and behavioral impacts of AI on clinical decision-making, and develop educational strategies to maximize the benefits of AI-CDS systems across all levels of medical expertise. Additionally, future studies should include multicenter trials with larger patient cohorts and diverse hospital types, patient demographics, and clinical practices to validate the model across various healthcare settings. Furthermore, they should aim to develop AI models tailored to specific clinical contexts and consider differences in the prevalence of AKI and its outcomes in various clinical settings, such as medical and surgical intensive care units [27]. The different etiologies of AKI, including sepsis, surgery, and contrast agents should also be evaluated [6,28,29]. Addressing potential barriers to clinical implementation, such as user trust and information accessibility, is crucial. Developing user-friendly interfaces that provide intuitive visualizations and clear explanations of AI predictions, and implementing feedback mechanisms for continuous improvement, are important for successfully integrating AI models such as PRIME Solution into routine clinical practice. The dynamic interplay between AI and human judgment revealed in this study provides valuable insights into the future of healthcare, in which AI and clinicians work synergistically to improve patient outcomes.

Notes

Conflicts of interest

All authors have no conflicts of interest to declare.

Funding

This work was partly supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (MSIT) (No. RS-2022-II220184, Development and Study of AI Technologies to Inexpensively Conform to Evolving Policy on Ethics), grant No. 13-2023-0007 from the Seoul National University Bundang Hospital research fund, grant No. 800-20230609 from Seoul National University Research Fund, and Daewon Pharmaceutical Co., Ltd.

Data sharing statement

The data presented in this study are available from the corresponding author upon reasonable request.

Authors’ contributions

Conceptualization: GY, JY, SK

Data curation: JY, JS

Formal analysis: JY

Funding acquisition, Supervision: SK

Investigation: GY, SH, JS, HH, JC, JHK

Methodology: GY, JY, JS, HH, JC, SK

Project administration, Resources: GY, SK

Software: JS, EM, HH, JC

Validation: GY, JHK

Visualization: JY, EM

Writing–original draft: GY, JY

Writing–review & editing: All authors

All authors read and approved the final manuscript.

Figure 1.

Comparison of diagnostic metrics of specialists, physicians, medical students, and the PRIME Solution.

(A) Visual representation of acute kidney injury (AKI) prediction task. (B) Accuracy, (C) precision, (D) recall, and (E) F1 score. Bars represent the average performance of each group. Error bars reflect the 95% confidence intervals. The analysis of variance p-value indicates the statistical significance of differences between groups. Icons are designed by Freepik.
AI, artificial intelligence.
j-krcp-24-163f1.jpg
Figure 2.

Comparison of diagnostic metrics with and without the support of the PRIME Solution.

(A) Comparative overview: non-assisted vs. AI-assisted prediction. (B–E) Accuracy, precision, recall, and F1 score of SET1 (non-assisted) and SET2 (AI-assisted). Bars represent the average performance of each set. Error bars reflect the 95% confidence intervals. The p-values indicate statistical significance based on paired t tests. (F–I) Accuracy, precision, recall, and F1 score of each group with (SET2) and without (SET1) the support of the PRIME Solution. Dots represent the metrics of individual participants, and the lines connect the metrics of the same participant with and without AI assistance. Icons are designed by Freepik.
AI, artificial intelligence.
j-krcp-24-163f2.jpg
Figure 3.

Comparative analysis of the duration of the AKI prediction: evaluating the efficiency of the PRIME Solution’s assistance.

(A) Comparison of the mean durations of the AKI prediction tasks with and without the aid of PRIME Solution, indicating a difference in time efficiency. Analysis was conducted on 74 out of 100 cases where PRIME Solution provided predictions. (B) Analysis of the mean prediction durations of specialists, physicians, and medical students, showing variations between with and without AI assistance. Analysis was conducted on 74 out of 100 cases where PRIME Solution provided predictions. (C) Outlines of the mean prediction durations when the PRIME Solution predicted the occurrence of AKI (AKI+), predicted no occurrence of AKI (AKI–), and did not offer a prediction (No Pred).
AI, artificial intelligence; AKI, acute kidney injury.
j-krcp-24-163f3.jpg
Figure 4.

Changes in selected clinical actions: impact of PRIME Solution’s assistance.

Variations in clinical actions selected by specialists, physicians, and medical students are observed, with each point indicating the change in the number of actions that an individual chose to take. An upward shift (+direction) in points represents an increase in selected actions in clinical decision-making in scenarios with AI assistance compared to those without AI assistance. The figure is divided into three panels, each representing different prediction scenarios by PRIME Solution: (A) cases where AKI occurrence was predicted, (B) cases where no AKI was predicted, and (C) cases where no prediction was provided. This layout allows for a comparison of how participants’ action selections changed across different prediction contexts.
AI, artificial intelligence; AKI, acute kidney injury.
j-krcp-24-163f4.jpg
Figure 5.

Comparison of match rates of the key predictive variables determined by the evaluator groups and the model.

The graph shows the match rates of key predictive variables identified by the PRIME Solution and those selected by various evaluator groups when making AKI predictions. The model, which selects 20 variables, naturally shows a higher match rate because of its broader selection set compared to that of the evaluators who were limited to choosing up to 10 variables. SET1 represents the non-assisted selection, and SET2 indicates AI-assisted selection.
AI, artificial intelligence; AKI, acute kidney injury.
j-krcp-24-163f5.jpg
Table 1.
Comparison of predictive diagnostic metrics for acute kidney injury of specialists, physicians, and medical students with and without AI assistance
Group Accuracy Precision Recall F1 Specificity NPV FPR MCC TS
AI model 0.568 0.275 0.786 0.407 0.517 0.912 0.483 0.238 0.256
SET1 (without AI assistance)
 Specialist 0.797 0.464 0.321 0.364 0.908 0.852 0.092 0.265 0.225
 Physician 0.557 0.281 0.768 0.404 0.508 0.909 0.492 0.229 0.254
 Medical student 0.619 0.326 0.600 0.382 0.623 0.872 0.377 0.205 0.237
 Overall 0.629 0.335 0.610 0.387 0.633 0.882 0.367 0.225 0.241
 ANOVA p-value 0.123 0.294 0.022 0.697 0.082 0.050 0.082 0.731 0.714
SET2 (AI-assisted)
 Specialist 0.743 0.391 0.643 0.486 0.767 0.902 0.233 0.347 0.323
 Physician 0.510 0.248 0.768 0.371 0.450 0.902 0.550 0.181 0.228
 Medical student 0.586 0.339 0.757 0.434 0.547 0.905 0.453 0.270 0.284
 Overall 0.587 0.315 0.740 0.420 0.552 0.904 0.448 0.251 0.271
 ANOVA p-value 0.201 0.464 0.693 0.412 0.199 0.996 0.199 0.519 0.415
Comparison of SET1 and SET2
 t test p-value 0.047 0.37 0.045 0.28 0.002 0.29 0.002 0.56 0.24

The presented values represent the mean score of each metric for each group. For detailed explanations of the metrics, see Supplementary Table 5 (available online).

The ANOVA p-value represents the results of the ANOVA performed to compare the metrics of the specialist, physician, and medical student groups for each set (SET1 and SET2). The t test p-value represents the results of a paired t test performed to compare the metrics for SET1 and SET2.

AI, artificial intelligence; ANOVA, analysis of variance; F1, F1 score; FPR, false-positive rate; MCC, Matthews correlation coefficient; NPV, negative predictive value; TS, threat score.

References

1. Kellum JA, Romagnani P, Ashuntantang G, Ronco C, Zarbock A, Anders HJ. Acute kidney injury. Nat Rev Dis Primers 2021;7:52.
crossref pmid pdf
2. Park S, Baek SH, Ahn S, et al. impact of electronic acute kidney injury (AKI) alerts with automated nephrologist consultation on detection and severity of AKI: a quality improvement study. Am J Kidney Dis 2018;71:9–19.
crossref pmid
3. Ramgopal S, Sanchez-Pinto LN, Horvat CM, Carroll MS, Luo Y, Florin TA. Artificial intelligence-based clinical decision support in pediatrics. Pediatr Res 2023;93:334–341.
crossref pmid pdf
4. Nagendran M, Festor P, Komorowski M, Gordon AC, Faisal AA. Quantifying the impact of AI recommendations with explanations on prescription decision making. NPJ Digit Med 2023;6:206.
crossref pmid pmc pdf
5. Jeon H, Jang HR. Electronic alerts based on clinical decision support system for post-contrast acute kidney injury. Kidney Res Clin Pract 2023;42:541–545.
crossref pmid pmc pdf
6. Zhang H, Wang AY, Wu S, et al. Artificial intelligence for the prediction of acute kidney injury during the perioperative period: systematic review and meta-analysis of diagnostic test accuracy. BMC Nephrol 2022;23:405.
crossref pmid pmc pdf
7. Bajgain B, Lorenzetti D, Lee J, Sauro K. Determinants of implementing artificial intelligence-based clinical decision support tools in healthcare: a scoping review protocol. BMJ Open 2023;13:e068373.
crossref pmid pmc
8. Kim K, Yang H, Yi J, et al. Real-time clinical decision support based on recurrent neural networks for in-hospital acute kidney injury: external validation and model interpretation. J Med Internet Res 2021;23:e24120.
crossref pmid pmc
9. Bach S, Binder A, Montavon G, Klauschen F, Müller KR, Samek W. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 2015;10:e0130140.
crossref pmid pmc
10. Patterson ES, Doebbeling BN, Fung CH, Militello L, Anders S, Asch SM. Identifying barriers to the effective use of clinical reminders: bootstrapping multiple methods. J Biomed Inform 2005;38:189–199.
crossref pmid
11. Alzubaidi L, Zhang J, Humaidi AJ, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data 2021;8:53.
crossref pmid pmc pdf
12. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. Paper presented at: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018 Jun 18-23; Salt Lake City, UT, USA. p. 7132–7141.
crossref
13. Yang M, Liu S, Hao T, et al. Development and validation of a deep interpretable network for continuous acute kidney injury prediction in critically ill patients. Artif Intell Med 2024;149:102785.
crossref pmid
14. Hicks SA, Strümke I, Thambawita V, et al. On evaluation metrics for medical applications of artificial intelligence. Sci Rep 2022;12:5979.
crossref pmid pmc pdf
15. Mistry NS, Koyner JL. Artificial intelligence in acute kidney injury: from static to dynamic models. Adv Chronic Kidney Dis 2021;28:74–82.
crossref pmid pmc
16. Bell S, James MT, Farmer CK, Tan Z, de Souza N, Witham MD. Development and external validation of an acute kidney injury risk score for use in the general population. Clin Kidney J 2020;13:402–412.
crossref pmid pmc pdf
17. Schwager E, Ghosh E, Eshelman L, Pasupathy KS, Barreto EF, Kashani K. Accurate and interpretable prediction of ICU-acquired AKI. J Crit Care 2023;75:154278.
crossref pmid pmc
18. Castelvecchi D. Can we open the black box of AI? Nature 2016;538:20–23.
crossref pmid pdf
19. Flechet M, Falini S, Bonetti C, et al. Machine learning versus physicians’ prediction of acute kidney injury in critically ill adults: a prospective evaluation of the AKIpredictor. Crit Care 2019;23:282.
crossref pmid pmc pdf
20. Inkpen K, Chappidi S, Mallari K, et al. Advancing human-AI complementarity: the impact of user expertise and algorithmic tuning on joint decision making. ACM Trans Comput Hum Interact 2023;30:1–29.
crossref
21. Wang X, Yin M. Are explanations helpful? A comparative study of the effects of explanations in ai-assisted decision-making. In: Proceedings of the 26th International Conference on Intelligent User Interfaces (IUI ’21). Association for Computing Machinery; 2021. p. 318–328.
crossref
22. Shamszare H, Choudhury A. Clinicians’ perceptions of artificial intelligence: focus on workload, risk, trust, clinical decision making, and clinical integration. Healthcare (Basel) 2023;11:2308.
crossref pmid pmc
23. Glick A, Clayton M, Angelov N, Chang J. Impact of explainable artificial intelligence assistance on clinical decision-making of novice dental clinicians. JAMIA Open 2022;5:ooac031.
crossref pmid pmc pdf
24. Khosravi M, Zare Z, Mojtabaeian SM, Izadi R. Artificial intelligence and decision-making in healthcare: a thematic analysis of a systematic review of reviews. Health Serv Res Manag Epidemiol 2024;11:23333928241234863.
crossref pmid pmc pdf
25. Khalifa M, Albadawy M, Iqbal U. Advancing clinical decision support: the role of artificial intelligence across six domains. Comput Methods Programs Biomed Update 2024;5:100142.
crossref
26. Dubois C, Le Ny J. Adaptive task allocation in human-machine teams with trust and workload cognitive models. Paper presented at: 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC); 2020 Oct 11-14; Toronto, ON, Canada. p. 3241–3246.
crossref
27. Lee Y, Kim T, Kim DE, et al. Differences in the incidence, characteristics, and outcomes of patients with acute kidney injury in the medical and surgical intensive care units. Kidney Res Clin Pract 2024;43:518–527.
crossref pmid pmc pdf
28. Cheungpasitporn W, Thongprayoon C, Kashani KB. Artificial intelligence and machine learning’s role in sepsis-associated acute kidney injury. Kidney Res Clin Pract 2024;43:417–432.
crossref pmid pmc pdf
29. Choi H, Choi B, Han S, et al. Applicable machine learning model for predicting contrast-induced nephropathy based on pre-catheterization variables. Intern Med 2024;63:773–780.
crossref pmid


ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS
Editorial Office
#301, (Miseung Bldg.) 23, Apgujenog-ro 30-gil, Gangnam-gu, Seoul 06022, Korea
Tel: +82-2-3486-8736    Fax: +82-2-3486-8737    E-mail: registry@ksn.or.kr                

Copyright © 2025 by The Korean Society of Nephrology.

Developed in M2PI

Close layer