Preview

Medical Doctor and Information Technologies

Advanced search

Automated abbreviations recognition system for unified national medical nomenclature filling with using russian language unstructured text of articles

https://doi.org/10.25881/18110193_2023_4_24

Abstract

The Unified national medical nomenclature (UNMN) has been under development since 2022 with using the Unified Medical Language System (UMLS) Metathesaurus and other sources. UNMN is a terminological system based on ontological approach and potentially applicable in Russian language medical text annotating. Currently, terms from different clinical branches are being added to UNMN utilizing both automatized and expert ways. Often in medicine abbreviations allow expressing the meaning of the concepts in a rapid way. However, their recognition in unstructured text is not trivial issue. The development of software for automated abbreviations recognition from research articles could enrich UNMN and accelerate clinical decision support systems development.

 The aim of this study was to create the automated algorithm for UNMN terms abbreviations recognition from text of Russian language research articles.

 Methods. Validation and testing dataset included unstructured abstracts of Russian language research articles aggregated from eLIBRARY. Fulltext wordings of extracted abbreviations have been corrected with bilingual (RussianEnglish and EnglishRussian) translation.

 Results. Final version of the algorithm based on semantic rules demonstrated ~93% sensitivity and ~99% specificity in abbreviations and their fulltext wordings extraction. Large percentage (~87%) of terms has been successfully corrected and presented in the initial form after bilingual translation. Half (~49%) of abbreviations has been mapped with 100% accuracy to UNMN terms. Processing of 168 000 abstracts using the developed algorithm lead to creation of the Unified medical abbreviations thesaurus with UNMN terms (exceeding 6600 unique entries).

About the Authors

P. A. Astanin
Pirogov Russian National Research Medical University
Russian Federation

Moscow



L. V. Ronzhin
Pirogov Russian National Research Medical University
Russian Federation

Moscow



A. A. Fedorov
Pirogov Russian National Research Medical University
Russian Federation

Moscow



S. E. Rauzina
Pirogov Russian National Research Medical University
Russian Federation

PhD

Moscow



T. V. Zarubina
Pirogov Russian National Research Medical University
Russian Federation

 Corresponding Member of the RAS, DSc, Prof.

Moscow



References

1. Osmolovsky IS, Zarubina TV. Developing and testing a prototype expert system for gout diagnosis. Social Aspects of Population Health. 2023; 69(4): 1-24. (In Russ.)] doi: 10.21045/2071-5021-2023-69-4-15.

2. Zarubina TV, Kobrinskii BA, Lipkin YuG. Medical Informatics: Textbook. M.: GEOTAR-Media, 2022. 464 р. (In Russ.)] doi: 10.33029/9704-6273-7-TMI-2022-1-464.

3. Kiselev KV, Potekhina AV, Osyaeva MK. Development of concepts nomenclature for clinical decision support system in diagnostics of angina pectoris. Eurasian heart journal. 2018; 3: 14-25. (In Russ.)]

4. Nugumanova AB, Bayburin EM, Mansurova ME, Barakhnin VB. Automatic extraction of formal lattices from medical texts based on the combination of the formal concept analysis and bootstrapping technologies. Vestnik NSU. Series: Information Technologies. 2018; 16(4): 140-152. (In Russ.) doi: 10.25205/1818-7900-2018-16-4-140-152.

5. Sboev AG, Selivanov AA, Rybka RB. Sovremennye metody ehkstraktsii svyazannykh imenovannykh sushchnostei na primere biomeditsinskikh tekstovykh dannykh. Vestnik Voennogo innovatsionnogo tekhnopolisa «Ehra». 2022; 3(1): 57-67. (In Russ.)] doi: 10.56304/S2782375X22010193.

6. Budykina AV, Tikhomirova EV, Kiselev KV. Formalization of knowledge about gastrointestinal bleeding of unknown origin for use in intelligent clinical decision support systems. Journal of new medical technologies. 2020; 27(4): 98-101. (In Russ.)] doi: 10.24411/1609-2163-2020-16741.

7. Shakhmametova GR, Khudoba EV. Razrabotka metoda strukturirovaniya dannykh i znanii klinicheskikh rekomendatsii. Informatsionnye tekhnologii intellektual’noi podderzhki prinyatiya reshenii (ITIDS’2019): Trudy VII Vserossiiskoi nauchnoi konferentsii (s priglasheniem zarubezhnykh uchenykh). 2019; 2: 237-240. (In Russ.)

8. Astanin PA, Ronzhin LV, Rauzina SE. Algorithm for UMLS metathesaurus concepts specificity estimation using example of analysis of the semantic model describing axial spondyloarthritis differential diagnostics. Medical doctor and information technologies. 2023; 3: 30-42. (In Russ.) doi: 10.25881/18110193_2023_3_30.

9. Astanin PA, Rauzina SE, Zarubina TV. Automated system for recognizing clinically relevant UMLS terms in texts of the English-language articles exemplified by axial spondyloarthritis. Social Aspects of Population Health. 2023; 69(3): 1-28. (In Russ.) doi: 10.21045/2071-5021-2023-69-3-14.

10. Gusev A, Korsakov I, Novitsky R, et al. Feature extraction method from electronic health records in Russia. Proceedings of the 26th FRUCT Conference. 2020: 497–500. doi: 10.5281/zenodo.4007408.

11. Orlova NV, Suvorov GN, Gorbunov KS. Ethics and legal regulation of using large databases in medicine. Medical Ethics. 2022; 10(3): 4-9. (In Russ.)] doi: 10.24075/medet.2022.056.

12. Cossin S, Margaux J, Larrouture I, et al. Semi-Automatic Extraction of Abbreviations and their Senses from Electronic Health Records. 2021: 1-12.

13. Ezhkov AA. Analiz issledovanii v oblasti obrabotki nestrukturirovannykh tekstov v meditsine. Nauka i Prosveshchenie: sbornik statei II Mezhdunarodnoi nauchno-prakticheskoi konferentsii «Nauchnoe obozrenie». 2022: 23-26. (In Russ.)

14. Shraiberg YaL, Dmitrieva EYu, Smirnova OV. Developing the system of interconnected classifications: Comparing the State Rubricator of Sci-tech Information and Universal Decimal Classification. Scientific and Technical Libraries. 2023; 11: 36-65. (In Russ.) doi: 10.33186/1027-3689-2023-11-36-65.

15. Pikalev YaS. Razrabotka sistemy normalizatsii tekstovykh korpusov. Problemy iskusstvennogo intellekta. 2022; 25(2): 64-78. (In Russ.)

16. Astapov RL, Mukhmadeeva RM. Avtomatizirovannaya predobrabotka teksta dlya opredeleniya ehmotsional’noi okraski teksta. iScience. 2021; 5-2(73): 19-23. (In Russ.)] doi: 10.32743/UniTech.2023.107.2.15064

17. Логунова Т.В., Щербакова Л.В., Васюков В.М., Шимкун В.В. Анализ алгоритмов классифи кации текстов // Universum: технические науки. — 2023. — №2-2(107). — С.4-20..

18. Gruzdev DYu, Makarenko AS, Kodzhebash DO. Corpus annotation development principles. Vestnik MITU — MARHI. 2023; 1: 88-97. (In Russ.) doi: 10.52470/2619046X_2023_1_88.

19. Pashuk AV, Gurinovich AB, Volorova NA, Kuznetsov AP. Analysis of the methods of word sense disambiguation in the biomedical domain. Doklady BGUIR. 2019; 5(123): 60-65. (In Russ.)] doi: 10.35596/1729-7648-2019-123-5-60-65.

20. Valiev AI, Lysenkova SA. Application of machine learning methods for automation of the process of the text contents analysis. Proceedings in Cybernetics. 2021; 44(4): 12-15. (In Russ.) doi: 10.34822/1999-7604-2021-4-12-15.

21. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics. 2020; 36(4): 1234-1240. doi: 10.1093/bioinformatics/btz682.

22. Zhang Y, Tiryaki F, Jiang M, et al. Parsing clinical text using the state-of-the-art deep learning based parsers: a systematic comparison. BMC Med Inform Decis Mak. 2019; 19(3): 77. doi: 10.1186/s12911-019-0783-2.

23. Lenivtceva IuD, Kopanitsa GD. Automatic allergy classification based on Russian unstructured medical texts. Scientific and Technical Journal of Information Technologies, Mechanics and Optics. 2021; 21(3): 433- 436. (In Russ.)] doi: 10.17586/2226-1494-2021-21-3-433-436.

24. Khoruzhaya AN, Kozlov DV, Arzamasov KM, Kremneva EI. Text Analysis of Radiology Reports with Signs of Intracranial Hemorrhage on Brain CT Scans Using the Decision Tree Algorithm. Sovremennye tehnologii v medicine. 2022; 14(6): 34-41. (In Russ.) doi: 10.17691/stm2022.14.6.04.

25. Krotova OS, Moskalev IV, Khvorova LA, Nazarkina OM. Implementation of effective models for classifying medical data using text mining. Izvestiya of Altai State University. 2020; 111(1): 99-104. (In Russ.) doi: 10.14258/izvasu(2020)1-16.

26. Tkachenko SA, Kolomytseva EP. Razrabotka podkhodov po vyyavleniyu imenovannykh sushchnostei v biomeditsinskikh tekstakh s ispol’zovaniem metodov nechetkoi logiki. Vektor razvitiya sovremennoi nauki: Sbornik statei VII Mezhdunarodnoi nauchno-prakticheskoi konferentsii. 2020: 34-41. (In Russ.)

27. Zulkarneev RKH, Yusupova NI, Smetanina ON. Method and models of extraction of knowledge from medical documents. Informatics and Automation. 2022; 21(6): 1169-1210. (In Russ.) doi: 10.15622/ia.21.6.4.

28. Klyshinskii EhS, Gribova VV, Shakhgel’dyan KI. Algoritm avtomaticheskogo vydeleniya zhalob patsientov iz istorii bolezni. Novye informatsionnye tekhnologii v avtomatizirovannykh sistemakh. 2019; 22: 204-209. (In Russ.)

29. Legashev LV, Shukhman AE, Bolodurina IP. Russian unstructured clinical texts processing and probabilistic classification of disease groups. Medical doctor and information technologies. 2022; 4: 52-63. (In Russ.)] doi: 10.25881/18110193_2022_4_52.

30. Serdyuk YuP, Vlasova NA, Momot SR. A system for extracting symptom mentions from texts by means of neural networks. Program Systems: Theory and Applications. 2023; 14(56(1)): 95-123. (In Russ.) doi: 10.25209/2079-3316-2023-14-1-95-123.

31. Moskalev IV, Krotova OS, Khvorova LA. Avtomatizatsiya protsessa izvlecheniya strukturirovannykh dannykh iz nestrukturirovannykh meditsinskikh vypisok s primeneniem tekhnologii intellektual’nogo analiza tekstov. High-performance computing systems and technolohies. 2020; 4(1): 163-167. (In Russ.)

32. Du X, Zhu R, Li Y, Anjum A. Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis. Future Gener Comput Syst. 2019; 98: 238-251. doi: 10.1016/j.future.2019.01.016.

33. Chang JT, Schütze H, Altman RB. Creating an online dictionary of abbreviations from MEDLINE. J Am Med Inform Assoc. 2002; 9(6): 612-620. doi: 10.1197/jamia.m1139.

34. Qiao J, Jinling L, Xinghua L. Deep contextualized biomedical abbreviation expansion. Proceedings of the 18th BioNLP Workshop and Shared Task in Florence, Italy. 2019: 88-96. doi: 10.18653/v1/W19-5010.

35. Juyong K, Gong L, Khim J, et al. Improved clinical abbreviation expansion via non-sense-based approaches. Proceedings of Machine Learning Research. 2020; 136: 161-178.

36. Skreta M, Arbabi A, Wang J, et al. Automatically disambiguating medical acronyms with ontology aware deep learning. Nat Commun. 2021; 12(1): 5319. doi: 10.1038/s41467-021-25578-4.


Review

For citations:


Astanin P.A., Ronzhin L.V., Fedorov A.A., Rauzina S.E., Zarubina T.V. Automated abbreviations recognition system for unified national medical nomenclature filling with using russian language unstructured text of articles. Medical Doctor and Information Technologies. 2023;(4):24-35. (In Russ.) https://doi.org/10.25881/18110193_2023_4_24

Views: 14


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1811-0193 (Print)
ISSN 2413-5208 (Online)