Development and validation of a tool for statistical comparison of ROC-curves using the example of algorithms based on artificial intelligence technologies
https://doi.org/10.25881/18110193_2023_3_4
Abstract
Background: Due to the National Strategy for the Development of Artificial Intelligence, large-scale digitalization of healthcare is taking place in the Russian Federation, which leads to huge number of various practical and scientific tasks emergence of, which in turn require convenient tools to solve them. ROC analysis tool is one of them, which was developed and successfully applied within the framework of the project «Experiment on the use of innovative technologies in the field of computer vision for the analysis of medical images and further application in the healthcare system of the city of Moscow». However, there is an urgent need for the development of a module comparing ROC-curves in order to solve a wider range of problems related to analytics of the operation of technologies based on artificial intelligence.
Aim: to implement the ROC analysis tool module for comparing the area under the curve using statistical methods and calculating the p-value, and to test it on real data.
Materials and methods: the tool is implemented in Python 3.9. The 95% confidence interval for ROC curves was calculated using the bootstrapping and the DeLong method. Areas under the ROC curves comparison was carried out using a permutation test.
The testing of the tool was carried out on the 6 algorithms work results on 2 data sets. Area under the ROC curve pairwise comparison was carried out and the results were compared with the same data results analysis, calculated by the DeLong method (roc.test function, R language 3.6.1).
Results: the p-values obtained using the permutation test were in most cases comparable to the roc.test results, however, in 4 out of 30 cases, the p-values differed significantly, which led to changes in the test interpretation.
Discussion: the differences in the results calculated by two separate methods, in our opinion, are due to the peculiarities of the methods used: DeLong method is more conservative. Also, due to the use of the pseudorandomization method in the permutation test, variability of results is possible, which can lead to uncertainty. In addition, the developed tool compares data of the same length, which is a limitation of its use, but its further development is possible for data of different lengths.
Conclusion: the module for comparing ROC curves was successfully implemented and tested using statistical criteria with the calculation of the p-value.
About the Authors
T. M. BobrovskayaRussian Federation
Moscow
Y. S. Kirpichev
Russian Federation
Moscow
E. F. Savkina
Russian Federation
Moscow
S. F. Chetverikov
Russian Federation
PhD
Moscow
K. M. Arzamasov
Russian Federation
PhD
Moscow
References
1. Ukaz Prezidenta Rossijskoj Federacii ot 10.10.2019 g. «O razvitii iskusstvennogo intellekta v Rossijskoj Federacii» № 490. Available at: http://www.kremlin.ru/acts/bank/44731/page/1. Accessed 14.02.2023. (In Russ.)
2. Gusev AV, Vladzymyrskyy AV, Sharova DE, et al. Evolution of research and development in the field of artificial intelligence technologies for healthcare in the Russian Federation: results of 2021. Digital Diagnostics. 2022; 3(3): 178-194. (In Russ.). doi: 10.17816/DD107367.
3. Morozov SP, Gavrilov AV, Arkhipov IV, et al. Effect of artificial intelligence technologies on the CT scan interpreting time in COVID-19 patients in inpatient setting. Profilakticheskaya Meditsina. 2022; 25(1): 14-20. (In Russ.) doi: 10.17116/PROFMED20222501114.
4. Raya-Povedano JL, Romero-Martín S, Elías-Cabot E, et al. AI-based Strategies to Reduce Workload in Breast Cancer Screening with Mammography and Tomosynthesis: A Retrospective Evaluation. Radiology. 2021; 300(1): 57-65. doi: 10.1148/RADIOL.2021203555.
5. Morozov SP, Vladzymyrskyy AV, Ledikhova NV, et al. Moscow experiment on computer vision in radiology: involvement and participation of radiologists. Vrach i informacionnye tekhnologii. 2020; 4: 14-23. (In Russ.)
6. Andreychenko AE, Logunova TA, Gombolevskiy VA, et al. A methodology for selection and quality control of the radiological computer vision deployment at the megalopolis scale. medRxiv. 2022: 2022.02.12.22270663. doi: 10.1101/2022.02.12.22270663.
7. Database registration certificate №2022617324 Web-instrument dlya vypolneniya ROC analiza rezul’tatov diagnosticheskih testov: № 2022616046: Appl. 05.04.2022, publ. 19.04.2022. Morozov SP, Andreychenko AE, Chetverikov SF, et al. (In Russ.)
8. ROC Analysis. Available at: https://roc-analysis.mosmed.ai/ Accessed 12.08.2023. (In Russ.)
9. Goncalves S, Fong PC, Blokhina M. Artificial intelligence for early diagnosis of lung cancer through incidental nodule detection in low- and middle-income countries-acceleration during the COVID-19 pandemic but here to stay. Am J Cancer Res. 2022; 12(1): 1.
10. Dash Documentation & User Guide Plotly. Available at: https://dash.plotly.com/docs. Accessed 08.08.2023.
11. roc-utils. Available at: https://github.com/hirsch-lab/roc-utils. Accessed 21.08.2022.
12. Sun X, Xu W. Fast implementation of DeLong’s algorithm for comparing the areas under correlated receiver operating characteristic curves. IEEE Signal Process Lett. 2014; 21(11): 1389-1393. doi: 10.1109/LSP.2014.2337313.
13. Pauly M, Asendorf T, Konietschke F. Permutation-based inference for the AUC: A unified approach for continuous and discontinuous data. Biom J. 2016; 58(6): 1319-1337. doi: 10.1002/BIMJ.201500105.
14. Metz CE. ROC analysis in medical imaging: a tutorial review of the literature. Radiol Phys Technol. 2008; 1(1): 2-12. doi: 10.1007/S12194-007-0002-1/FIGURES/2.
15. Statistical Software. Sample Size Software. NCSS. Available at: https://www.ncss.com/ Accessed 08.02.2023.
16. Goksuluk D, Korkmaz S, Zararsiz G, Karaagaoglu AE. EasyROC: An interactive web-tool for roc curve analysis using r language environment. R Journal. 2016; 8(2): 213-230. doi: 10.32614/RJ-2016-042.
17. ROC Analysis: Online ROC Curve Calculator. Available at: http://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html. Accessed 08.02.2023.
18. Artificial intelligence in radiology. Available at: https://mosmed.ai/ai/ Accessed 08.02.2023.
19. Kolyadin VL. Permutacionnye kriterii kak universal’nyj neparametricheskij podhod k proverke statisticheskih gipotez. Radioelektronika i informatika. 2002; 3: 20. (In Russ.)
20. Demler OV, Pencina MJ, D’ RB, Sr A. Misuse of DeLong test to compare AUCs for nested models. Published online 2012. doi: 10.1002/sim.5328.
21. Klinicheskie ispytaniya sistem iskusstvennogo intellekta (luchevaya diagnostika). Vasilyev YA, Vladzymyrskyy AV, Sharova DE, et al. Seriya «Luchshie praktiki luchevoj i instrumental’noj diagnostiki». 2023. 40 р. (In Russ.)
22. Tyrov IA, Vasilyev YA, Arzamasov KM, et al. Assessment of the maturity of artificial intelligence technologies for healthcare: methodology and its application based on the use of innovative computer vision technologies for medical image analysis and subsequent applicability in the healthcare system of Moscow. Medical doctor and information technology. 2022; 4: 76-92. (In Russ.) doi: 10.25881/18110193_2022_4_76.21.
23. Probabilistic Graphical Models: Principles and Techniques — Daphne Koller, Nir Friedman.
Review
For citations:
Bobrovskaya T.M., Kirpichev Y.S., Savkina E.F., Chetverikov S.F., Arzamasov K.M. Development and validation of a tool for statistical comparison of ROC-curves using the example of algorithms based on artificial intelligence technologies. Medical Doctor and Information Technologies. 2023;(3):4-15. (In Russ.) https://doi.org/10.25881/18110193_2023_3_4