Unsupervised Attention Embedding for Document Clustering

Ji-kang NIE, Zhi-guo ZHANG

Abstract


Deep clustering algorithms perform learning feature representations and clustering tasks jointly by using neural networks with significantly improved performance over the traditional k-means or spectral clustering. Some groundbreaking proposals extract data spaces directly in “bags of words” approach without considering the semantic information of each document as inputs to deep auto-encoder networks. But these algorithms suffer from inaccurate feature space from the encoder output when dealing with incomprehensible and high-dimensional data. For solving this problem in this paper, an Attention-based Deep Embedded Clustering (ADEC) algorithm is proposed to improve representation of data space. ADEC extracts high quality embedded features and performs clustering jointly with learning embedded features which are suitable for document clustering. The experimental result shows that the performance and accuracy of document clustering is improved significantly using the ADEC clustering framework on two datasets REUTERS-10K and REUTERS.

Keywords


Document clustering, Attention embedding, Deep clustering


DOI
10.12783/dtcse/cscbd2019/30032

Full Text:

PDF

Refbacks

  • There are currently no refbacks.