Approaches to building radiology datasets
https://doi.org/10.25881/18110193_2023_4_14.
Abstract
The application of machine learning in healthcare, as one of the more general artificial intelligence technology, has shown enormous potential for improving diagnostic and treatment outcomes for various conditions. However, success of AI-based software largely depends on the availability of high-quality medical datasets and the infrastructure built to streamline its management. Creating relevant, representative and accurately labeled datasets is a complex and expensive task that requires diverse expertise and a robust roadmap for dataset building in radiology.
This paper presents a dataset creation methodology in radiology that establishes principles and protocols to ensure a standardized approach to dataset building, secures a convenient infrastructure for data management, and provides a framework to automate the creation of high-quality datasets.
With our experience in implementing the methodology presented in this paper for routine diagnostic imaging, we demonstrate typical errors that arise when preparing radiology datasets and offer ways to avoid them.
About the Authors
T. M. BobrovskayaRussian Federation
Moscow
Yu. A. Vasilev
Russian Federation
PhD
Moscow
N. Yu. Nikitin
Russian Federation
PhD
Moscow
К. M. Arzamasov
Russian Federation
PhD
Moscow
References
1. Ukaz Prezidenta Rossijskoj Federacii ot 10.10.2019 №490 «O razvitii iskusstvennogo intellekta v Rossijskoj Federacii» // Elektronnyj fond pravovyh i normativno-tekhnicheskih dokumentov. Available at: https://docs.cntd.ru/document/563441794. Accessed 28.08.2023. (In Russ.)
2. Gusev AV, Vladzymyrskyy AV, Sharova DE, et al. Evolution of research and development in the field of artificial intelligence technologies for healthcare in the Russian Federation: results of 2021 // Digital Diagnostics. 2022; 3(3): 178-194. (In Russ.)] doi: 10.17816/DD107367.
3. Arzamasov KM, Vasilev YuA, Vladzymyrskyy AV, et al. The use of computer vision for the mammography preventive research. Profilakticheskaja-medicina. 2023; 26(6): 117-123. (In Russ.)] doi: 10.17116/profmed202326061117.
4. Pavlov NA, et al. Reference medical datasets (MosMedData) for independent external evaluation of algorithms based on artificial intelligence in diagnostics. Digital Diagnostics. 2021; 2(1): 49-66. (In Russ.)] doi: 10.17816/DD60635.
5. GOST R 52653-2006. Informacionno-kommunikacionnye tekhnologii v obrazovanii. Terminy i opredeleniya // Elektronnyj fond pravovyh i normativno-tekhnicheskih dokumentov. Available at: https://docs.cntd.ru/document/1200053103. Accessed 28.08.2023. (In Russ.)]
6. Willemink MJ, Koszek WA, Hardell C, et al. Preparing Medical Imaging Data for Machine Learning. Radiology. 2020; 295(1): 4-15. doi:10.1148/radiol.2020192224.
7. Aggarwal R, Sounderajah V, Martin G, et al. Diagnostic accuracy of deep learning in medical imaging: a systematic review and meta-analysis. NPJ Digit Med. 2021; 4(1): 65. doi:10.1038/s41746-021-00438-z.
8. Vladzimirskiy AV. Vasil’ev YUA, et al. Computer vision in radiation diagnostics: the first stage of the Moscow experiment. M.: Izdatel’skie resheniya, 2022; 388 р. (In Russ.)
9. Vasiliev YuA, et al. Fundamental principles for standardizing and systematizing information about data sets for machine learning in medical diagnostics. Healthcare Manager. 2023; 4: 28-41. (In Russ.)] doi: 10.21045/1811-0185-2023-4-28-41.
10. Prikaz Ministerstva zdravoohraneniya Rossijskoj Federacii ot 24.12.2018 №911n «Ob utverzhdenii Trebovanij k gosudarstvennym informacionnym sistemam v sfere zdravoohraneniya sub»ektov Rossijskoj Federacii, medicinskim informacionnym sistemam medicinskih organizacij i informacionnym sistemam farmacevticheskih organizacij». Available at: https://normativ.kontur.ru/document?moduleId=1&documentId=338271. Accessed 28.08.2023. (In Russ.)
11. Federal’nyj zakon «O personal’nyh dannyh» ot 27.07.2006 №152-FZ. Available at: https://normativ.kontur.ru/document?moduleId=1&documentId=447363. Accessed 28.08.2023. (In Russ.)
12. Kulberg NS, et al. Methodology and tools for creating training samples for artificial intelligence systems for recognizing lung cancer on CT images. Healthcare of the Russian Federation. 2020; 6: 343-350. (In Russ.) doi: 10.46563/0044-197X-2020-64-6-343-350.
13. Borisov AA, et al. Using transfer learning for automated detection of defects in chest X-rays. Medical imaging. 2023; 27(1): 158-168. (In Russ.) doi: 10.24835/1607-0763-1243.
14. Amelina EV, et al. Features of creating a database of neuro-oncological 3D MRI images for training artificial intelligence. Siberian Scientific Medical Journal. 2022; 42(6): 51-59. (In Russ.)] doi: 10.18699/SSMJ20220606.
15. Kivelev YuV, et al. Formation of a big data set for clinical research using the example of cerebral aneurysms. 2023; 43(3): 86-94. (In Russ.) doi: 10.18699/SSMJ20230311.
16. Nguyen HQ, Lam K, Le LT, et al. VinDr-CXR: An open dataset of chest X-rays with radiologist’s annotations. Sci Data. 2022; 9(1): 429. doi: 10.1038/s41597-022-01498-w.
Review
For citations:
Bobrovskaya T.M., Vasilev Yu.A., Nikitin N.Yu., Arzamasov К.M. Approaches to building radiology datasets. Medical Doctor and Information Technologies. 2023;(4):14-23. (In Russ.) https://doi.org/10.25881/18110193_2023_4_14.