Document Type : Original Article

Authors

1 PhD Candidate, Medical Informatics, Department of Medical Informatics, Mashhad University of Medical Sciences, ‎Mashhad, Iran

2 Assistant Professor, Pathologist, School of Medicine, Zahedan Brunch, Islamic Azad University, Iran

3 Assistant Professor, Medical Informatics, Department of Medical Informatics, Mashhad University of Medical ‎Sciences, Mashhad, ‎Iran‎

Abstract

Introduction: Pathology reports generally use an unstructured text format and contain a complex web of ‎relations between medical concepts. In order to enable computers to understand and analyze ‎the reports’ free text, we aimed to convert these concepts and their relations into a structured ‎format.‎ Methods: The training, validation, and evaluation of this implementation study was based on a corpus ‎of 258 pathology reports with a positive diagnosis of celiac disease randomly selected from ‎among the records of 2 pathology laboratories. Our proposed system consisted of 3 phases of ‎standardization of celiac disease pathology reports using Delphi technique with 3 experts, ‎information extraction from free text reports with text mining techniques using Stanford ‎Parser, and automatic classification of celiac disease stages in marsh system using decision ‎tree classifier J48 algorithm.‎ Results: We were successful in extracting information from free text pathology reports and assigning ‎each piece of information to the associated pre-defined fields in standardized template form ‎with an accuracy of 76%. After determining marsh stage for each report in the third phase, ‎our system showed an average overall accuracy of 62%. Evaluation of the third phase as an ‎independent system with manually corrected, gold-standard input achieved an accuracy of ‎greater than 84%.‎ Conclusion: The benefits of standardized synoptic pathology reporting include enhanced completeness ‎and improved consistency, avoidance of confusion and error, and facilitation of the faster and ‎safer transmission of critical pathological data in comparison with narrative reports.‎

Keywords