Kidney Res Clin Pract > Epub ahead of print
Pyo, Jeon, Shin, Jung, Lim, Eom, Choi, and Renal Pathology Study Group of Korean Society of Pathologists: Interobserver agreement analysis among renal pathologists in classification of lupus nephritis using a digital pathology image dataset: after a third evaluation

Abstract

Background

Lupus nephritis is well-known for low concordance in classification. Furthermore, there has been no agreement analysis among Korean renal pathologists regarding lupus nephritis. Inconsistent diagnosis leads to confusion and increases medical costs, as well as failure of appropriate therapeutic interventions. This study aimed to assess the level of agreement among Korean renal pathologists regarding classification.

Methods

Representative glomerular images from patients diagnosed with lupus nephritis were obtained from five hospitals. Twenty-five questions were formulated, and multiple-choice questions with 14 options, consisting of characteristic histopathological findings of lupus nephritis were provided. Three rounds of surveys were conducted and educational sessions were conducted before the second and third surveys.

Results

The agreement was calculated using Fleiss’ κ and the means for each round of questions were as follows: Survey 1, 0.42 (range, 0.18–0.61), Survey 2, 0.42 (range, 0.19–0.64), and Survey 3, 0.47 (range, 0.23–0.65). Although κ after the first education session showed no significant difference compared to the initial κ (p = 0.95), after the second education session, κ increased significantly compared to the initial κ (p < 0.001). The κ for each item generally increased with each education session, but they were not statistically significant (p = 0.46, p = 0.17). Additionally, the rankings of agreement, for each item, were relatively consistent.

Conclusion

This study conducted an interobserver agreement analysis of Korean pathologists for lupus nephritis, with the goal of increasing agreement through education. Although the education increased overall agreement, items like “mesangial hypercellularity,” “endocapillary hypercellularity,” and “neutrophils and/or karyorrhexis” remained inconsistent attributable to innate subjectivity and ineffective education.

Introduction

With recent developments in artificial intelligence (AI) using deep learning, AI-related studies in the field of digital pathology are expanding. One of the factors that must be considered in AI-based pathology image analysis studies is that high reproducibility and accuracy of the pathology diagnosis are the basic premises of AI learning. If there is no consensus as to what is the “gold standard” for diagnosis, and different pathologists make different diagnoses for the same pathology image, the basis for AI learning will be compromised, and the reliability of results will be low [1,2]. Therefore, a high level of diagnostic agreement is a prerequisite for AI-powered analytics.
Factors that affect diagnostic agreement among pathologists include clarity of the definition of diagnostic terms, cognitive biases, method of evaluation of pathological findings (quantitative or qualitative), institutional variability, cross-training, and the level of experience or training of the pathologists [1]. Several studies have investigated or improved agreement in the diagnosis of renal disease [310]. In particular, the diagnosis of lupus nephritis, which has a major impact on clinical prognosis and treatment decisions, is poor among pathologists [4,6,7,9].
Two previous studies on diagnostic agreement have been conducted by The Renal Pathology Study Group of the Korean Society of Pathologists (RPS-KSP) [11,12]. These studies standardized the terminologies of renal pathology and improved diagnostic agreement through training with a virtual slide atlas. In the Nephrotic Syndrome Study Network (NEPTUNE) study, intra- and interobserver variability were significantly reduced after two rounds of web-based cross-training [5]. Web-based cross-training provides pathologists with the opportunity to meet across time and space, which can dramatically improve diagnostic agreement.
In this study, we identified interobserver variability among pathologists for the accurate diagnosis and classification of lupus nephritis and attempted to improve agreement through educational training. Thus, we aimed to improve the quality and accuracy of the diagnosis of lupus nephritis and provide a gold standard for future AI-powered studies.

Methods

Ethical approval

The study protocol was approved by the Institutional Review Board (IRB) of Wonju Severance Christian Hospital (No. CR321144). Written consent to publish this study was waived by the IRB board due to its retrospective nature and lack of access to the patient clinical information.

Case selection and survey

Histopathological slides from patients diagnosed with lupus nephritis were gathered from four hospitals: Severance Hospital, Gangnam Severance Hospital, Wonju Severance Christian Hospital, and CHA University CHA Bundang Medical Center. Representative glomerular images were chosen from the slides and captured using a digital camera at 400× magnification (Olympus) at the discretion of each institutional pathologist. Twenty-five questions were formulated for the questionnaire. Each question referenced four images of a glomerulus stained with hematoxylin and eosin (H&E), periodic acid-Schiff (PAS), trichrome, or periodic acid-methenamine silver (PAMS). Multiple-choice questions with 14 options consisting of characteristic histopathological findings of lupus nephritis were provided. Google Forms was used as the survey platform (Fig. 1) and the survey was conducted among members of the RPS-KSP. The entire membership was provided with a web link to the questionnaire. In addition to responding to the questionnaire, participants were asked how many years they had practiced as a renal pathologist and how many renal biopsies they reported per year. Three surveys were administered with intervals of 3 months between Survey 1 and Survey 2, and 7 months between Survey 2 and Survey 3. Educational sessions were held 2 to 4 weeks before Survey 2 and Survey 3. The first education was approximately 10 minutes for RPS-KSP members only and the second education was approximately 35 minutes for RPS-KSP members and clinicians. Both were delivered via online lecture. The content included previous survey results, a literature review of discordant findings, and a brief overview of diagnostic pitfalls. After the sessions, an educational presentation file was provided to the members of the RPS-KSP that included the highest discrepancies and their authorized definitions.

Statistical analysis

Fleiss’ kappa (Fleiss’ κ) evaluates agreement when there are more than two raters. Fleiss’ κ was calculated for each question and item (0 = no agreement, 1 = perfect agreement). Fleiss’ κ was also calculated for those with more than 10 years of experience. Specifically, the presence of a histopathological finding was coded as 1 and the absence of the histopathological finding as 0. For question-by-question agreement, each question with 14 options as subjects was analyzed, while for item-by-item agreement, 25 questions as subjects were analyzed. A κ-value of <0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, or >0.8 was considered to reflect “poor,” “fair,” “moderate,” “good,” or “very good” agreement, respectively. Statistical significance was set at p < 0.05.
Initial pre-educational session agreement was compared to post-educational session agreement, twice, comparing pre-educational session (Survey 1) to post-first educational session (Survey 2) and pre-educational session (Survey 1) to post-second educational session (Survey 3). Paired t tests showed a significance level of 0.05.
In cases where the intrinsic issue of kappa agreement analysis, influenced by balanced marginal distribution, led to the expectation of high agreement because almost everyone gave the same response, there were instances where kappa was low or negative. To address this issue, additional Gwet’s AC1 statistics were computed [13].
All analyses, except for Gwet’s AC1 statistics, were performed using IBM SPSS (version 27.0; IBM Corp.). Gwet’s AC1 statistics were calculated using the following website: https://play93.shinyapps.io/Gwet_Scott/. For Gwet’s AC1, <0.2, 0.2–0.4, 0.4–0.6, 0.6–0.8, or >0.8 were considered to reflect “slight,” “fair,” “moderate,” “substantial,” or “almost perfect” agreement, respectively.

Results

Fourty-three RPS-KSP members responded more than once. There were 31 respondents to Survey 1, 28 respondents to Survey 2, and 19 respondents to Survey 3. Of these, 16, 14, and 12, respectively, had more than 10 years of experience reporting renal biopsies for the differential diagnosis of internal medicine conditions.
The number of renal biopsies reported per year varied among the respondents. Seven respondents reported more than 300 biopsies per year, one reported 200 to 300, 11 reported 100 to 200, 12 reported 51 to 100, and seven reported 50 or fewer. Among the highly experienced pathologists, six reported more than 300 renal biopsies per year, six reported 100 to 200 per year, four reported 51 to 100 per year, and four reported 50 or fewer per year.
The κ-values for each question are presented in Supplementary Fig. 1 (available online) and Supplementary Table 1 (available online). The κ-values by item are presented in Fig. 2 and Supplementary Table 2 (available online). The mean ± standard deviation (SD) of the κ-values, respectively, for the surveys of 25 questions were as follows: Survey 1, 0.417 ± 0.011; Survey 2, 0.412 ± 0.010; and Survey 3, 0.472 ± 0.013. The overall κ-value of Survey 3 was significantly higher than that of Surveys 1 and 2 (p < 0.001 and p = 0.001, respectively). The κ-values for highly experienced pathologists, who had practiced in renal pathology for more than 10 years, were generally higher than those of all the pathologists (Fig. 3). The mean ± SD of the κ-values of the surveys for highly experienced pathologists were as follows: Survey 1, 0.475 ± 0.019; Survey 2, 0.427 ± 0.011; and Survey 3, 0.474 ± 0.015. The kappa value of Survey 3 for the highly experienced pathologists was significantly higher than that of Survey 2 (p = 0.009). Question 8 showed poor agreement with a kappa of 0.2 or less for all three concordance assessments (Supplementary Fig. 1, available online), but substantial agreement of 0.6 or more for Gwet’s AC1 (Supplementary Fig. 2, available online).
The mean ± SD of the κ-values, respectively, for the surveys of 14 items of lupus nephritis were as follows: Survey 1, 0.251 ± 0.033; Survey 2, 0.276 ± 0.042; and Survey 3, 0.309 ± 0.015. The κ-values for highly experienced pathologists were higher than those of all the pathologists. Overall item agreement was not statistically different for all the pathologists after the educational sessions. However, the agreement for Survey 3 increased for highly experienced pathologists after the educational sessions compared to the previous survey. There was “fair” agreement between endocapillary hypercellularity and neutrophils and/or karyorrhexis, with values of less than 0.4 in both Fleiss’ κ and Gwet’s AC1 analyses among all the pathologists and the highly experienced pathologists (Table 1, Figs. 24). Mesangial hypercellularity showed poor agreement with both Fleiss’ κ and Gwet’s AC1 value of 0.2 or less in all three surveys. The agreement between identification of mesangial hypercellularity and endocapillary hypercellularity had a greater increase after the two educational sessions than before, but the agreement between the identification of neutrophils and/or karyorrhexis had a greater decrease. For highly experienced pathologists only the ability to identify mesangial hypercellularity increased after two educational sessions, while identification of endocapillary hypercellularity and neutrophils and/or karyorrhexis decreased from pre-educational session levels (Fig. 3; Supplementary Fig. 3, available online).
In Survey 3 it was found that the ability to identify segmental sclerosis and adhesion between the tuft and capsule had lower κ-values than Survey 1, with κ-values for Survey 3 of 0.231 and 0.345, respectively, and κ-values for Survey 1 of 0.289 and 0.361, respectively. However, Gwet’s AC1 values for the two items, segmental sclerosis and adhesion between the tuft and capsule, were higher in Survey 3 (0.722 and 0.722, respectively) than in Survey 1 (0.685 and 0.688, respectively). Items such as normal, global sclerosis, spike or intramembranous hole formation, fibrous crescent, and double contour showed highly unbalanced marginal distributions and thus the κ-values were of no value. For Gwet’s AC1 value, for these items, the results were in almost perfect agreement, with scores of at least 0.8 (Fig. 4; Supplementary Fig. 4, available online).
The possibility is raised that the sincerity of those who dropped out of one of the three surveys may differ from those who completed all three surveys. In order to make a rigorous comparison of pre- and post-educational agreement, it is necessary to analyze agreement only among those who completed all three surveys (Supplementary Table 3, available online). When only those who completed all three surveys were analyzed, the agreement for each item was slightly higher than the agreement for participants who responded to one or more of the surveys, and the trend remained similar to before (Fig. 5). The increase in agreement from pre- to post-education varied by item. Of the three items with the lowest agreement, two (mesangial hypercellularity and endocapillary hypercellularity) increased in agreement after education (Gwet’s AC1; 0.184 and 0.329, to 0.194 and 0.334, respectively) and one (neutrophils and/or karyorrhexis) decreased in agreement (0.574 to 0.357). The kappa and Gwet’s AC1 value of Survey 2 for the highly experienced pathologists was significantly less than that of Survey 1 (p = 0.015 and 0.004, respectively).
The agreement between experienced and inexperienced pathologists was compared across all-three-survey respondents. The Gwet’s AC1 value of experienced varied from item to item when compared to inexperienced (Supplementary Fig. 5, available online). The definition of the experienced was narrowed than before, to more than 10 years of renal pathology practice and diagnosing at least 100 renal biopsies per year. The difference in agreement between experienced and inexperienced varied by item. Before the education, the experienced (n = 6) had six items with higher AC1 values than the inexperienced (n = 8; mesangial hypercellularity, endocapillary hypercellularity, fibrous crescent, wire loop lesion and/or hyaline thrombi, and double contour), but after the education, it decreased to four (endocapillary hypercellularity, spike or intramembranous hole formation, fibrocellular and fibrous crescent) (Fig. 6). There was no significant difference between the two groups in terms of overall agreement.

Discussion

There are few studies on concordance between pathologists in the diagnosis of lupus nephritis, and the concordance is low [6]. To the best of our knowledge, this is the first study to assess concordance in the identification of the pathological lesions of lupus nephritis, in Korea. Since the 2018 International Society of Nephrology/Renal Pathology Society (ISN/RPS) revision of the classification of lupus nephritis, some histopathological descriptors that comprise the activity and chronicity indices have been modified or redefined [14]. With the ISN/RPS revision, the definitions of mesangial hypercellularity, crescent, adhesion, and fibrinoid necrosis were revised, and endocapillary proliferation was renamed endocapillary hypercellularity. This is the first study to evaluate concordance using the new histopathological descriptors from the 2018 ISN/RPS revision, along with other histopathological features used for the diagnosis of lupus nephritis.
Dasari et al. [6] systematically reviewed the inter-pathologist agreement on lupus nephritis and concluded that the concordance was “poor” to “moderate.” In their review, leukocyte infiltration, a similar term to neutrophils in the modified activity/chronicity index, exhibited “poor” agreement, which is in line with our results (κ-value for neutrophils and/or karyorrhexis, <0.4). However, the agreement for “endocapillary hypercellularity” was lower than previous studies, which showed “moderate” agreement (intraclass correlation coefficient [ICC] or κ-value, >0.4) [6,7,9,15], despite two educational sessions. This is likely to be due to the inclusion of mesangial hypercellularity as an option, unlike in previous studies, or unclear definitions. Most studies used a crude assessment by scoring the percentage of involvement of the total glomeruli in the slide, according to a cutoff [7,9,15], whereas this study used a more rigorous evaluation of endocapillary hypercellularity per glomerulus. Although mesangial hypercellularity and endocapillary hypercellularity often coexist, the 2018 revision does not provide criteria for distinguishing between them. The Oxford Working Group reported that the concordance of segmental endocapillary hypercellularity was “fair” [3]. They also reported that mesangial cellularity was difficult to score in segments with endocapillary hypercellularity; therefore, they scored them as “indeterminate” for mesangial cellularity in the presence of global endocapillary hypercellularity. Cellular and fibrous crescents showed an increase in agreement from “poor” to “moderate” previously (cellular ICC, 0.5 and 0.55 ± 0.07; fibrous ICC, 0.25 ± 0.09 and 0.58) to “good” to “almost perfect” agreement in this study (cellular κ, >0.6; fibrous Gwet’s AC1, >0.9) [15,16]. This can be hypothesized as being attributed to the lowering of the cutoff for extracapillary proliferation to 10% from 25% [14], which reduced uncertainty by encouraging identification of lesions that were previously borderline to be determined to be crescentic, thereby improving agreement. Second, a more detailed definition of a fibrocellular/fibrous crescent [14], which was not previously available, may have assisted in improving the concordance. Despite fibrinoid necrosis being defined in detail for the first time in the revision [14], it is possible that the similar degree of agreement (“poor” agreement on κ-value, 0.32 to 0.47; “substantial” agreement on Gwet’s AC1, 0.61 to 0.76) seen with fibrinoid necrosis/karyorrhexis (ICC: 0.26, 0.48, and 0.45 ± 0.09) [6] is because it is now determined separately, as opposed to being combined with karyorrhexis. In both the NEPTUNE and the Nephrotic Syndrome Study Network Digital Pathology Scoring System studies, the agreement was higher than before, after grouping individual descriptors [5,10].
Mesangial hypercellularity is not a component of the activity index and chronicity index but is a key feature that can be diagnostic of class II lupus nephritis when present with appropriate immunofluorescence or electron microscopic findings and has not been addressed in previous lupus nephritis concordance studies [17]. The definition of mesangial hypercellularity in the ISN/RPS revision was taken from the definition of immunoglobulin A (IgA) nephropathy in the Oxford classification, and the cutoff was increased from three cells to four cells, which was emphasized in the educational sessions of this study. Despite a more detailed definition and a minimal increase in concordance after two educational sessions, mesangial hypercellularity had the lowest agreement among the items; however, this has been frequently observed in other studies [1820]. According to concordance studies on IgA nephropathy, there was “moderate” to “poor” agreement in determining the presence of mesangial hypercellularity in more than half of the biopsied glomeruli, suggesting that the agreement for the presence of mesangial hypercellularity in a single glomerulus is expected to be even lower. Furthermore, it is not yet known whether a clear-cut distinction between mesangial hypercellularity and endocapillary hypercellularity can be made in class III and IV lesions [14]. It is also unclear whether the cutoff of four cells for mesangial hypercellularity is for mesangial cells alone or if it also includes inflammatory cells [14]. More specific definitions will be required in the future (Supplementary Table 4, available online).
After analysis, it was found that some items had low κ-values despite the high agreement observed, and this was due to the ‘prevalence paradox’ of Fleiss’ κ [13,21,22]. To compensate for the uneven distribution of responses, as the ‘prevalence paradox’ of Fleiss’ κ can cause the agreement value to be too low compared to the observed agreement, we performed Gwet’s AC1 analysis. Given that the limitations of kappa, which have been pointed out in previous studies, are also evident in some items of this study, Gwet’s AC1 is a more appropriate measure of agreement than Fleiss’ κ, especially when the agreement is high [2325].
It is noteworthy that even with the narrower definition of experienced pathologists, less than half of the items have higher agreement than inexperienced, with insignificant difference, and the difference is even smaller after education. This is different from previous studies that have shown high concordance with experts [5,6], and suggests that, at least in Korean nephropathologists, the level of experience does not necessarily correlate with higher concordance in lupus nephritis glomeruli. However, this study also found that agreement increased in some items with educational sessions. This emphasizes the importance of regular training of pathologists, at least in some items.
This study is more detailed and systematic, uses digital images to assess the agreement between the components of the activity index and chronicity index of lupus nephritis for each glomerulus, and is the first concordance study to use the definitions of the 2018 ISN/RPS revision. It is also more objective and general than an agreement assessment based on a small number of pathologists, as it includes a relatively large number of pathologists and a high response rate. This study included four images, H&E, PAS, trichrome, and PAMS, to represent the diagnostic setting. Educational sessions were successful in improving agreement and the benefits were immediately applicable in the clinic as the majority of the pathologists worked at multiple institutions.
This study has some limitations. It included only glomeruli and did not evaluate the degree of agreement for tubulointerstitial and vascular lesions. Glomerular selection bias was unavoidable. Few glomeruli reported global sclerosis or spikes; therefore, the reliability of degree of agreement of these two items is questionable. A post hoc review of the glomerular images revealed that there were no typical images in which spikes or global sclerosis were easily identifiable. Therefore, additional images should be included in future assessments. The education was a one-way lecture, which seems to be less effective than the interactive open-round meeting. Especially for experienced pathologists, an interactive open-round meeting would be more effective, where the attendants could comment on each other and discuss problematic points in depth and may lead to a better agreement. Finally, the study was limited to Korean patients and pathologists.
The treatment of lupus nephritis is based on histopathological classification and the activity/chronicity index, and appropriate treatment affects patient prognosis. In addition to effectively training a machine-learning model, the training data must be highly reliable, which is difficult to achieve when the histopathological diagnostic agreement is low between pathologists. This study showed improvement in agreement after two educational sessions. This is immediately applicable in clinical practice and is the basis for the development of accurate AI models.

Notes

Conflicts of interest

All authors have no conflicts of interest to declare.

Funding

This study was supported by a grant from the KOREAN NEPHROLOGY RESEARCH FOUNDATION (Renal Pathology Research Grant 2021 to ME). The sponsor had no role in the study design, data collection, or analyses.

Acknowledgments

We would like to thank Dr. Dongwook Kim for his help with the statistical analysis and the RPS-KSP members for their active participation.

Data sharing statement

The data presented in this study are available from the corresponding author upon reasonable request.

Authors’ contributions

Conceptualization: JYP, BJL, ME, SEC

Data collection: All authors

Formal analysis: JYP, SEC, NJ

Funding acquisition: ME

Writing–original draft: JYP, SEC

Writing–review & editing: JYP, SEC, ME

All authors read and approved the final manuscript.

Figure 1.

The survey form sent to the membership of The Renal Pathology Study Group of the Korean Society of Pathologists.

The pathologists were asked to score 25 images containing one glomerulus each, which entailed answering 25 multiple-choice questions with 14 options.
j-krcp-24-185f1.jpg
Figure 2.

The κ-values for each item from all pathologists (inexperienced and experienced).

There was “fair” agreement for endocapillary hypercellularity and neutrophils and/or karyorrhexis (κ < 0.4), and “poor” agreement for mesangial hypercellularity (κ < 0.2) across all three surveys. However, the agreement for endocapillary hypercellularity and mesangial hypercellularity increased slightly after the educational sessions.
j-krcp-24-185f2.jpg
Figure 3.

The κ-values for each item for the experienced pathologists.

The κ-values for highly experienced pathologists were generally higher than those of all the pathologists. The agreement from Survey 3 increased after the educational sessions compared with the previous survey.
j-krcp-24-185f3.jpg
Figure 4.

Gwet’s AC1 values for each item for all pathologists (inexperienced and experienced).

Normal, global sclerosis, spike or intramembranous hole formation, fibrous crescent, and double contour that had a “poor” or even a negative κ-value (κ < 0.2), showed “almost perfect” agreement on Gwet’s AC1 analysis (AC1 > 0.8). However, the agreement values for endocapillary hypercellularity and mesangial hypercellularity were less than “fair.”
j-krcp-24-185f4.jpg
Figure 5.

The κ and Gwet’s AC1 values for agreement among all-three-survey responders for each item.

When the respondents were narrowed down to all-three-time respondents, the agreement increased for most items. However, the trend remained similar to before.
j-krcp-24-185f5.jpg
Figure 6.

Comparison of κ and Gwet’s AC1 values between experienced and inexperienced among all-three-survey responders.

Experienced was defined as practicing for at least 10 years and diagnosing at least 100 cases per year. There was no significant difference between the two groups in terms of overall agreement.
j-krcp-24-185f6.jpg
Table 1.
Agreement of the 14 lupus nephritis descriptor items by Surveys (1, 2, and 3), all pathologists (inexperienced and experienced), and experienced pathologists
Item Inexperienced and experienced pathologists
Experienced pathologists
Fleiss’ κ
Gwet’s AC1
Fleiss’ κ
Gwet’s AC1
Survey 1 (n = 31) Survey 2 (n = 28) Survey 3 (n = 19) Survey 1 (n = 31) Survey 2 (n = 28) Survey 3 (n = 19) Survey 1 (n = 16) Survey 2 (n = 14) Survey 3 (n = 12) Survey 1 (n = 16) Survey 2 (n = 14) Survey 3 (n = 12)
Normal 0.082 0.51 0.583 0.914 0.910 0.927 0.421 0.509 0.554 0.923 0.919 0.927
Global sclerosis –0.001 –0.001 –0.001 0.997 0.997 1.000 -a –0.003 -a 1.000 0.994 1.000
Segmental sclerosis 0.289 0.299 0.231 0.685 0.729 0.722 0.319 0.276 0.233 0.720 0.702 0.722
Adhesion between the tuft and capsule 0.361 0.341 0.345 0.688 0.726 0.722 0.369 0.295 0.370 0.636 0.658 0.722
Mesangial hypercellularity 0.128 0.113 0.159 0.127 0.098 0.182 0.122 0.105 0.159 0.173 0.107 0.182
Endocapillary hypercellularity 0.325 0.281 0.367 0.327 0.270 0.389 0.399 0.310 0.355 0.430 0.312 0.389
Spike or intramembranous hole formation 0.001 –0.009 –0.006 0.942 0.948 1.000 –0.020 –0.009 a 0.957 0.983 1.000
Crescent: cellular 0.637 0.644 0.677 0.859 0.867 0.857 0.627 0.604 0.662 0.858 0.866 0.857
Crescent: fibrocellular 0.408 0.453 0.506 0.893 0.884 0.927 0.438 0.496 0.549 0.910 0.869 0.927
Crescent: fibrous 0.101 0.020 0.124 0.981 0.990 0.993 0.037 –0.006 0.051 0.985 0.988 0.993
Neutrophils and/or karyorrhexis 0.325 0.311 0.282 0.482 0.464 0.352 0.324 0.276 0.302 0.388 0.415 0.352
Fibrinoid necrosis 0.323 0.387 0.467 0.756 0.737 0.614 0.381 0.426 0.453 0.701 0.717 0.614
Wire loop lesion and/or hyaline thrombi 0.418 0.438 0.573 0.767 0.776 0.876 0.550 0.504 0.544 0.831 0.821 0.876
Double contour 0.127 0.086 0.025 0.828 0.772 0.886 0.160 –0.004 0.028 0.822 0.780 0.886

aAll respondents answered the same, “absence.”

j-krcp-24-185i1.jpg

References

1. Barisoni L, Gimpel C, Kain R, et al. Digital pathology imaging as a novel platform for standardization and globalization of quantitative nephropathology. Clin Kidney J 2017;10:176–187.
crossref pmid pmc
2. Barisoni L, Lafata KJ, Hewitt SM, Madabhushi A, Balis UG. Digital pathology and computational image analysis in nephropathology. Nat Rev Nephrol 2020;16:669–685.
crossref pmid pmc pdf
3. Working Group of the International IgA Nephropathy Network and the Renal Pathology Society, Roberts IS, Cook HT, et al. The Oxford classification of IgA nephropathy: pathology definitions, correlations, and reproducibility. Kidney Int 2009;76:546–556.
pmid
4. Azoicăi T, Belibou IM, Lozneanu L, Giuşcă SE, Cojocaru E, Căruntu ID. Large variability of the activity and chronicity indexes within and between histological classes of lupus nephritis. Rom J Morphol Embryol 2017;58:73–78.
pmid
5. Barisoni L, Troost JP, Nast C, et al. Reproducibility of the NEPTUNE descriptor-based scoring system on whole-slide images and histologic and ultrastructural digital images. Mod Pathol 2016;29:671–684.
crossref pmid pmc pdf
6. Dasari S, Chakraborty A, Truong L, Mohan C. A systematic review of interpathologist agreement in histologic classification of lupus nephritis. Kidney Int Rep 2019;4:1420–1425.
crossref pmid pmc
7. Grootscholten C, Bajema IM, Florquin S, et al. Interobserver agreement of scoring of histopathological characteristics and classification of lupus nephritis. Nephrol Dial Transplant 2008;23:223–230.
crossref pmid
8. Sar A, Worawichawong S, Benediktsson H, Zhang J, Yilmaz S, Trpkov K. Interobserver agreement for Polyomavirus nephropathy grading in renal allografts using the working proposal from the 10th Banff Conference on Allograft Pathology. Hum Pathol 2011;42:2018–2024.
crossref pmid
9. Wilhelmus S, Cook HT, Noël LH, et al. Interobserver agreement on histopathological lesions in class III or IV lupus nephritis. Clin J Am Soc Nephrol 2015;10:47–53.
crossref pmid
10. Zee J, Hodgin JB, Mariani LH, et al. Reproducibility and feasibility of strategies for morphologic assessment of renal biopsies using the nephrotic syndrome study network digital pathology scoring system. Arch Pathol Lab Med 2018;142:613–625.
crossref pmid pmc pdf
11. Jin SY, Jeong HJ, Sung SH, et al. Practical standardization in renal biopsy reporting. Korean J Pathol 2010;44:613–622.
crossref
12. Cho U, Suh KS, Kie JH, Choi YJ; Renal Pathology Study Group of Korean Society of Pathologists. Investigation and standardization on current practice of renal transplant pathology in Korea. J Korean Soc Transplant 2017;31:170–176.
crossref pdf
13. Kim MS, Song KJ, Nam CM, Jung IK. A study on comparison of generalized kappa statistics in agreement analysis. Korean J Appl Stat 2012;25:719–731.
crossref
14. Bajema IM, Wilhelmus S, Alpers CE, et al. Revision of the International Society of Nephrology/Renal Pathology Society classification for lupus nephritis: clarification of definitions, and modified National Institutes of Health activity and chronicity indices. Kidney Int 2018;93:789–796.
pmid
15. Oni L, Beresford MW, Witte D, et al. Inter-observer variability of the histological classification of lupus glomerulonephritis in children. Lupus 2017;26:1205–1211.
crossref pmid pdf
16. Wernick RM, Smith DL, Houghton DC, et al. Reliability of histologic scoring for lupus nephritis: a community-based evaluation. Ann Intern Med 1993;119:805–811.
crossref pmid
17. Weening JJ, D’Agati VD, Schwartz MM, et al. The classification of glomerulonephritis in systemic lupus erythematosus revisited. J Am Soc Nephrol 2004;15:241–250.
crossref pmid
18. Howie AJ, Lalayiannis AD. Systematic review of the Oxford classification of IgA nephropathy: reproducibility and prognostic value. Kidney360 2023;4:1103–1111.
crossref pmid pmc
19. Bellur SS, Roberts IS, Troyanov S, et al. Reproducibility of the Oxford classification of immunoglobulin A nephropathy, impact of biopsy scoring on treatment allocation and clinical relevance of disagreements: evidence from the VALidation of IGA study cohort. Nephrol Dial Transplant 2019;34:1681–1690.
crossref pmid
20. Kaneko Y, Yoshita K, Kono E, et al. Extracapillary proliferation and arteriolar hyalinosis are associated with long-term kidney survival in IgA nephropathy. Clin Exp Nephrol 2016;20:569–577.
crossref pmid pdf
21. Cicchetti DV, Feinstein AR. High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol 1990;43:551–558.
crossref pmid
22. Feinstein AR, Cicchetti DV. High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990;43:543–549.
crossref pmid
23. Cibulka MT, Strube MJ. The conundrum of Kappa and why some musculoskeletal tests appear unreliable despite high agreement: a comparison of cohen kappa and Gwet AC to assess observer agreement when using nominal and ordinal data. Phys Ther 2021;101:pzab150.
crossref pmid pdf
24. Gwet KL. Computing inter-rater reliability and its variance in the presence of high agreement. Br J Math Stat Psychol 2008;61:29–48.
crossref pmid
25. Wongpakaran N, Wongpakaran T, Wedding D, Gwet KL. A comparison of Cohen’s Kappa and Gwet’s AC1 when calculating inter-rater reliability coefficients: a study conducted with personality disorder samples. BMC Med Res Methodol 2013;13:61.
crossref pmid pmc pdf


ABOUT
BROWSE ARTICLES
EDITORIAL POLICY
FOR CONTRIBUTORS
Editorial Office
#301, (Miseung Bldg.) 23, Apgujenog-ro 30-gil, Gangnam-gu, Seoul 06022, Korea
Tel: +82-2-3486-8736    Fax: +82-2-3486-8737    E-mail: registry@ksn.or.kr                

Copyright © 2025 by The Korean Society of Nephrology.

Developed in M2PI

Close layer