DHCC: An Efficient Algorithm for Supervised Discretization

Qing DENG, Qing XUE, Xiang-zhong XU, Heng GAO

Abstract


To improve the speed and effectiveness of data mining in equipment simulation training system, a discretization algorithm based on hierarchy clustering and compatibility (DHCC) is proposed. Compared with the traditional discretization algorithms, DHCC algorithm calculates the positive domain of clusters to adjust the number of clusters and realize the initial division of each attribute by combining the association between attributes. Further on the basis of the initial discretization results generated by hierarchy clustering, information entropy and simplified compatibility degree are calculated to merge the adjacent intervals to reduce the number of broken points and eliminate superfluous intervals. Therefore the valid and brief discretization scheme is generated. Through six typical datasets tests, the results show that DHCC algorithm is superior to Equal-W, Equal-F, Chimerge, MDLP, and CAIM algorithm in the total number of intervals and accuracy.

Keywords


Discretization, Hierarchy clustering, Information entropy, Compatibility


DOI
10.12783/dtcse/msam2020/34259

Full Text:

PDF

Refbacks

  • There are currently no refbacks.