Current Issue Cover
混合高斯变分自编码器的聚类网络

陈华华, 陈哲, 郭春生, 应娜, 叶学义(杭州电子科技大学通信工程学院, 杭州 310018)

摘 要
目的 经典的聚类算法在处理高维数据时存在维数灾难等问题,使得计算成本大幅增加并且效果不佳。以自编码或变分自编码网络构建的聚类网络改善了聚类效果,但是自编码器提取的特征往往比较差,变分自编码器存在后验崩塌等问题,影响了聚类的结果。为此,本文提出了一种基于混合高斯变分自编码器的聚类网络。方法 使用混合高斯分布作为隐变量的先验分布构建变分自编码器,并以重建误差和隐变量先验与后验分布之间的KL散度(Kullback-Leibler divergence)构造自编码器的目标函数训练自编码网络;以训练获得的编码器对输入数据进行特征提取,结合聚类层构建聚类网络,以编码器隐层特征的软分配分布与软分配概率辅助目标分布之间的KL散度构建目标函数并训练聚类网络;变分自编码器采用卷积神经网络实现。结果 为了验证本文算法的有效性,在基准数据集MNIST (Modified National Institute of Standards and Technology Database)和Fashion-MNIST上评估了该网络的性能,聚类精度(accuracy,ACC)和标准互信息(normalized mutual information,NMI)指标在MNIST数据集上分别为95.86%和91%,在Fashion-MNIST数据集上分别为61.34%和62.5%,与现有方法相比性能有了不同程度的提升。结论 实验结果表明,本文网络取得了较好的聚类效果,且优于当前流行的多种聚类方法。
关键词
A Gaussian mixture variational autoencoder based clustering network

Chen Huahua, Chen Zhe, Guo Chunsheng, Ying Na, Ye Xueyi(School of Communication Engineering, Hangzhou Dianzi University, Hangzhou 310018, China)

Abstract
Objective Effective automatic grouping of data into clusters, especially clustering high-dimensional datasets, is one of the key issues in machine learning and data analysis. It is related to many aspects of signal processing applications, including computer vision, pattern recognition, speech and audio recognition, wireless communication and text classification. Current clustering algorithms are constrained of high computational complexity and poor performance in processing high-dimensional data due to the dimension disaster. Deep neural networks based clustering methods have its potential for real data clustering derived of their high representational ability. Autoencoder (AE) or variational autoencoder (VAE) clustering networks improve clustering effectiveness. But, their clustering performance is easy to be distorted intensively because of poor features extraction in distinguishing clear and unclear data or posterior collapse to clarify determining its posterior parameters of the latent variable of VAE, and they are insufficient to segment multiple classes, especially share very similar mean and variance in the context of clustering a multiclass dataset or two different classes. We demonstrate a clustering network based on VAE with the prior of Gaussian mixture (GM) distribution in terms of the deficiency of AE and VAE. Method The VAE, a maximum likelihood generative model, maximizes evidence lower bound (ELBO) via minimizing model reconstruction errors. Its difference of potential cost is through Kullback-Leibler (KL) divergence between the posterior distribution and the hypothesized prior, and then establishes maximum marginal log-likelihood (LL) of the data observed. Due to the approximate posterior distribution used VAE as a benched Gaussian distribution, it is challenged to match the ground truth posterior and have its priority of the KL term in ELBO, and the latent variable space may be arbitrarily complicated or even multimodal. To further improve the description of latent variables, a VAE is facilitated based on a latent variable prior of GM distribution. Its GM distribution prior linked data representation is approximated using the posterior distribution of the latent variable composed of a GM model, and the reconstruction error and the KL divergence based cost function between posterior and prior distribution is adopted to train the GM model based VAE. Due to the KL divergence between two GM distribution functions without a closed form solution, we use the approximate variational lower bound solution of the cost function with the aid of the fact that the KL divergence between the two single Gaussians has a closed-form solution, and implement the VAE using GM distribution priors optimization to resolve the KL divergence. A VAE based clustering network is constructed through a clustering layer combination behind the VAE. To improve the clustering performance, the STUDENT's t-distribution is used as a kernel to compute the soft assignment of the latent features of the VAE between the embedded point and the cluster centroid. Furthermore, a KL divergence cost is constructed between the soft assignment and its auxiliary target distribution. The commonly used VAE utilizes fully-connected neural networks to compute the latent variable, which generates more over fitted data parameters. Thus, the clustering network is carried out by convolutional neural networks (CNNs), which consist of three convolutional layers and two fully-connected layers, without fully-connected neural networks, and no pooling layers used in the network because it will result in loss of useful information of the data. The network is trained by optimizing the KL divergence cost using stochastic gradient descent (SGD) method with the initial network parameters from the VAE. Our clustering network was obtained by the two-step training mentioned above like acquired VAE, as the initial value to train the following clustering layer. Result To test the effectiveness of the proposed algorithm, our network is evaluated on the multiclass benchmark datasets MNIST(Modified National Institute of Standards and Technology Database) which contains images of 10 categories of handwritten digits, and Fashion-MNIST which consists of grayscale images associated to a 10 segmented label. Our algorithm achieves 95.86% accuracy (ACC) and 91% normalized mutual information (NMI) on MNIST, ACC 61.34% and 62.5% NMI on Fashion-NMIST. Our network demonstration has the similar performance to ClusterGAN with fewer parameters and less memory space. The experimental results illustrate that our network achieves feasible clustering performance. Conclusion We construct a VAE based clustering network with the prior of GM distribution. A novel framed VAE is established to improve the representation ability of the latent variable based on a latent variable prior of GM distribution. The KL divergence between posterior and prior GM distribution is optimized to achieve latent variable features of VAE and reconstruct its input well. To improve the clustering performance, the clustering network is trained by optimizing the KL divergence between the soft distribution of the latent features of the VAE and the auxiliary target distribution of the soft assignment. We focus on the issue of where the number of Gaussian components in prior and posterior is different and the ability of the representation of the model on complex texture features further.
Keywords

订阅号|日报