Document Type : مقاله مروری نقلی

Authors

1 Assistant Professor, Computer Engineering, Department of Information Technology, School of Industrial Engineering, Khajeh Nasir Toosi University of Technology, Tehran, Iran

2 MSc Student, Information Technology, School of Industrial Engineering, Khajeh Nasir Toosi University of Technology, Tehran, Iran

Abstract

Data mining, as a tool for extracting useful information from large data sets, has been one of the areas of interest to researchers in the field of health. Classification is a learning function by which data is mapped to one of the predefined categories. According to World Health Organization (WHO), heart disease, renal disease, diabetes and cancer have been the cause of 68% of all deaths in 2012. The aim of this research was to study various types of classification algorithms and the results of previous researches in this regard in the field of health. In this narrative review, studies on heart disease, breast cancer, and diabetes, published from 2003 to 2015, were investigated. The keywords of “data mining”, “classification”, “health”, “heart disease”, “diabetes”, and “breast cancer” were searched in ScienceDirect, Elsevier, Springer, and IEEE databases. In addition, references and citations of each retrieved article were collected. After the elimination of unsuitable studies, 34 articles were selected. Literature review showed that frequency of use of neural network algorithm was the highest for all three diseases. Neural network and Naïve Bayes for heart disease, K-nearest neighbors for breast cancer, and neural network for diabetes had the highest accuracy. In general, it can be concluded that although no algorithm can be consider the best algorithm for each disease with certainty, determining the best algorithm for each disease could be useful for future studies.

Keywords