A Gaussian mixture variational autoencoder based clustering network
- Vol. 27, Issue 7, Pages: 2148-2156(2022)
Published: 16 July 2022 ,
Accepted: 09 November 2020
DOI: 10.11834/jig.200467
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 July 2022 ,
Accepted: 09 November 2020
移动端阅览
Huahua Chen, Zhe Chen, Chunsheng Guo, Na Ying, Xueyi Ye. A Gaussian mixture variational autoencoder based clustering network. [J]. Journal of Image and Graphics 27(7):2148-2156(2022)
目的
2
经典的聚类算法在处理高维数据时存在维数灾难等问题,使得计算成本大幅增加并且效果不佳。以自编码或变分自编码网络构建的聚类网络改善了聚类效果,但是自编码器提取的特征往往比较差,变分自编码器存在后验崩塌等问题,影响了聚类的结果。为此,本文提出了一种基于混合高斯变分自编码器的聚类网络。
方法
2
使用混合高斯分布作为隐变量的先验分布构建变分自编码器,并以重建误差和隐变量先验与后验分布之间的KL散度(Kullback-Leibler divergence)构造自编码器的目标函数训练自编码网络;以训练获得的编码器对输入数据进行特征提取,结合聚类层构建聚类网络,以编码器隐层特征的软分配分布与软分配概率辅助目标分布之间的KL散度构建目标函数并训练聚类网络;变分自编码器采用卷积神经网络实现。
结果
2
为了验证本文算法的有效性,在基准数据集MNIST (Modified National Institute of Standards and Technology Database)和Fashion-MNIST上评估了该网络的性能,聚类精度(accuracy,ACC)和标准互信息(normalized mutual information,NMI)指标在MNIST数据集上分别为95.86%和91%,在Fashion-MNIST数据集上分别为61.34%和62.5%,与现有方法相比性能有了不同程度的提升。
结论
2
实验结果表明,本文网络取得了较好的聚类效果,且优于当前流行的多种聚类方法。
Objective
2
Effective automatic grouping of data into clusters
especially clustering high-dimensional datasets
is one of the key issues in machine learning and data analysis. It is related to many aspects of signal processing applications
including computer vision
pattern recognition
speech and audio recognition
wireless communication and text classification. Current clustering algorithms are constrained of high computational complexity and poor performance in processing high-dimensional data due to the dimension disaster. Deep neural networks based clustering methods have its potential for real data clustering derived of their high representational ability. Autoencoder (AE) or variational autoencoder (VAE) clustering networks improve clustering effectiveness. But
their clustering performance is easy to be distorted intensively because of poor features extraction in distinguishing clear and unclear data or posterior collapse to clarify determining its posterior parameters of the latent variable of VAE
and they are insufficient to segment multiple classes
especially share very similar mean and variance in the context of clustering a multiclass dataset or two different classes. We demonstrate a clustering network based on VAE with the prior of Gaussian mixture (GM) distribution in terms of the deficiency of AE and VAE.
Method
2
The VAE
a maximum likelihood generative model
maximizes evidence lower bound (ELBO) via minimizing model reconstruction errors. Its difference of potential cost is through Kullback-Leibler (KL) divergence between the posterior distribution and the hypothesized prior
and then establishes maximum marginal log-likelihood (LL) of the data observed. Due to the approximate posterior distribution used VAE as a benched Gaussian distribution
it is challenged to match the ground truth posterior and have its priority of the KL term in ELBO
and the latent variable space may be arbitrarily complicated or even multimodal. To further improve the description of latent variables
a VAE is facilitated based on a latent variable prior of GM distribution. Its GM distribution prior linked data representation is approximated using the posterior distribution of the latent variable composed of a GM model
and the reconstruction error and the KL divergence based cost function between posterior and prior distribution is adopted to train the GM model based VAE. Due to the KL divergence between two GM distribution functions without a closed form solution
we use the approximate variational lower bound solution of the cost function with the aid of the fact that the KL divergence between the two single Gaussians has a closed-form solution
and implement the VAE using GM distribution priors optimization to resolve the KL divergence. A VAE based clustering network is constructed through a clustering layer combination behind the VAE. To improve the clustering performance
the STUDENT's
t
-distribution is used as a kernel to compute the soft assignment of the latent features of the VAE between the embedded point and the cluster centroid. Furthermore
a KL divergence cost is constructed between the soft assignment and its auxiliary target distribution. The commonly used VAE utilizes fully-connected neural networks to compute the latent variable
which generates more over fitted data parameters. Thus
the clustering network is carried out by convolutional neural networks (CNNs)
which consist of three convolutional layers and two fully-connected layers
without fully-connected neural networks
and no pooling layers used in the network because it will result in loss of useful information of the data. The network is trained by optimizing the KL divergence cost using stochastic gradient descent (SGD) method with the initial network parameters from the VAE. Our clustering network was obtained by the two-step training mentioned above like acquired VAE
as the initial value to train the following clustering layer.
Result
2
To test the effectiveness of the proposed algorithm
our network is evaluated on the multiclass benchmark datasets MNIST(Modified National Institute of Standards and Technology Database) which contains images of 10 categories of handwritten digits
and Fashion-MNIST which consists of grayscale images associated to a 10 segmented label. Our algorithm achieves 95.86% accuracy (ACC) and 91% normalized mutual information (NMI) on MNIST
ACC 61.34% and 62.5% NMI on Fashion-NMIST. Our network demonstration has the similar performance to ClusterGAN with fewer parameters and less memory space. The experimental results illustrate that our network achieves feasible clustering performance.
Conclusion
2
We construct a VAE based clustering network with the prior of GM distribution. A novel framed VAE is established to improve the representation ability of the latent variable based on a latent variable prior of GM distribution. The KL divergence between posterior and prior GM distribution is optimized to achieve latent variable features of VAE and reconstruct its input well. To improve the clustering performance
the clustering network is trained by optimizing the KL divergence between the soft distribution of the latent features of the VAE and the auxiliary target distribution of the soft assignment. We focus on the issue of where the number of Gaussian components in prior and posterior is different and the ability of the representation of the model on complex texture features further.
聚类混合高斯分布变分自编码器(VAE)软分配KL散度
clusteringGaussian mixture distributionvariational autoencoder(VAE)soft assignmentKullback-Leibler(KL) divergence
Chazan S E, Gannot S and Goldberger J. 2019. Deep clustering based on a mixture of autoencoders//Proceedings of the 29th IEEE International Workshop on Machine Learning for Signal Processing. Pittsburgh, USA: IEEE: 1-6[DOI: 10.1109/MLSP.2019.8918720http://dx.doi.org/10.1109/MLSP.2019.8918720]
Cheng B Z, Zhao C H, Zhang L L and Zhang J P. 2017. Joint spatial preprocessing and spectral clustering based collaborative sparsity anomaly detection for hyperspectral images. Acta Optica Sinica, 37(4): 296-306
成宝芝, 赵春晖, 张丽丽, 张健沛. 2017. 联合空间预处理与谱聚类的协同稀疏高光谱异常检测. 光学学报, 37(4): 296-306[DOI:10.3788/AOS201737.0428001]
Dempster A P, Laird N M and Rubin D B. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological), 39(1): 1-22[DOI:10.2307/2984875]
Dilokthanakul N, Mediano P A M, Garnelo M, Lee M C H, Salimbeni H, Arulkumaran K and Shanahan M. 2017. Deep unsupervised clustering with Gaussian mixture variational autoencoders[EB/OL]. [2020-05-06].https://arxiv.org/pdf/1611.02648.pdfhttps://arxiv.org/pdf/1611.02648.pdf
Duan L, Aggarwal C, Ma S and Sathe S. 2019. Improving spectral clustering with deep embedding and cluster estimation//Proceedings of 2019 IEEE International Conference on Data Mining. Beijing, China: IEEE: 170-179[DOI: 10.1109/ICDM.2019.00027http://dx.doi.org/10.1109/ICDM.2019.00027]
Ester M, Kriegel H P, Sander J and Xu X W. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise[EB/OL]. [2020-05-06].https://max.book118.com/html/2017/0725/124226331.shtmhttps://max.book118.com/html/2017/0725/124226331.shtm
Fraley C and Raftery A E. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8): 578-588[DOI:10.1093/comjnl/41.8.578]
Guo C S, ZHOU J L, Chen H H, Ying N, Zhang J W and Zhou D. 2020. Variational autoencoder with optimizing Gaussian mixture model priors. IEEE Access, 8: 43992-44005[DOI:10.1109/ACCESS.2020.2977671]
Guo X F, Gao L, Liu X W and Yin J P. 2017. Improved deep embedded clustering with local structure preservation//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: AAAI: 1753-1759[DOI: 10.24963/ijcai.2017/243http://dx.doi.org/10.24963/ijcai.2017/243]
Hershey J R and Olsen P A. 2007. Approximating the Kullback Leibler divergence between Gaussian mixture models//Proceedings of 2007 IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu, USA: IEEE: IV-317-IV-320[DOI: 10.1109/ICASSP.2007.366913http://dx.doi.org/10.1109/ICASSP.2007.366913]
Jiang Z X, Zheng Y, Tan H C, Tang B S and Zhou H N. 2017. Variational deep embedding: an unsupervised and generative approach to clustering//Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne, Australia: AAAI: 1965-1972[DOI: 10.24963/ijcai.2017/273http://dx.doi.org/10.24963/ijcai.2017/273]
Kingma D P and Welling M. 2014. Auto-encoding variational Bayes[EB/OL]. [2020-05-06].https://arxiv.org/pdf/1312.6114.pdfhttps://arxiv.org/pdf/1312.6114.pdf
Kuhn H W. 2005. The Hungarian method for the assignment problem. Naval Research Logistics, 52(1): 7-21[DOI:10.1002/nav.20053]
LeCun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI:10.1109/5.726791]
Lim K L, Jiang X D and Yi C Y. 2020. Deep clustering with variational autoencoder. IEEE Signal Processing Letters, 27: 231-235[DOI:10.1109/LSP.2020.2965328]
Lu R, Xiang L, Liu M R and Yang Q. 2012. Discovering news topics from microblogs based on hidden topics analysis and text clustering. Pattern Recognition and Artificial Intelligence, 25(3): 382-387
路荣, 项亮, 刘明荣, 杨青. 2012. 基于隐主题分析和文本聚类的微博客中新闻话题的发现. 模式识别与人工智能, 25(3): 382-387[DOI:10.3969/j.issn.1003-6059.2012.03.004]
MacQueen J. 1967. Some methods for classification and analysis of multivariate observations[EB/OL]. [2020-05-06].http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=4172CEB4912D3E21EF68579C8888BA56?doi=10.1.1.308.8619&rep=rep1&type=pdfhttp://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=4172CEB4912D3E21EF68579C8888BA56?doi=10.1.1.308.8619&rep=rep1&type=pdf
Mukherjee S, Asnani H, Lin E and Kannan S. 2019. ClusterGAN: latent space clustering in generative adversarial networks[EB/OL]. [2020-05-06].https://arxiv.org/pdf/1809.03627v1.pdfhttps://arxiv.org/pdf/1809.03627v1.pdf
Opochinsky Y, Chazan S E, Gannot S and Goldberger J. 2020. K-autoencoders deep clustering//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing. Barcelona, Spain: IEEE: 4037-4041[DOI: 10.1109/ICASSP40776.2020.9053109http://dx.doi.org/10.1109/ICASSP40776.2020.9053109]
Sabour S, Frosst N and Hinton G E. 2017. Dynamic routing between capsules[EB/OL]. [2020-05-06].https://arxiv.org/pdf/1710.09829.pdfhttps://arxiv.org/pdf/1710.09829.pdf
van der Maaten L and Hinton G. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, 9: 2579-2605
Wu H, Wang Y J, Wang Z, Wang X L and Du S Z. 2010. Two-phase collaborative filtering algorithm based on co-clustering. Journal of Software, 21(5): 1042-1054
吴湖, 王永吉, 王哲, 王秀利, 杜拴柱. 2010. 两阶段联合聚类协同过滤算法. 软件学报, 21(5): 1042-1054[DOI:10.3724/SP.J.1001.2010.03758]
Xiao H, Rasul K and Vollgraf R. 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms[EB/OL].[2020-05-06].https://arxiv.org/pdf/1708.07747.pdfhttps://arxiv.org/pdf/1708.07747.pdf
Xie J Y, Girshick R and Farhadi A. 2016. Unsupervised deep embedding for clustering analysis[EB/OL].[2020-05-06].https://arxiv.org/pdf/1511.06335.pdfhttps://arxiv.org/pdf/1511.06335.pdf
Yang B, Fu X, Sidiropoulos N D and Hong M Y. 2017. Towards K-means-friendly spaces: simultaneous deep learning and clustering[EB/OL].[2020-05-06].https://arxiv.org/pdf/1610.04794.pdfhttps://arxiv.org/pdf/1610.04794.pdf
Yue F, Sun L, Wang K Q, Wang Y J and Zuo W M. 2008. State-of-the-art of cluster analysisof gene expression data. Acta Automatica Sinica, 34(2): 113-120
岳峰, 孙亮, 王宽全, 王永吉, 左旺孟. 2008. 基因表达数据的聚类分析研究进展. 自动化学报, 34(2): 113-120[DOI:10.3724/SP.J.1004.2008.00113]
Zhang H T, Cui Y, Wang D and Song T. 2018. Study of online healthy community user profile based on concept lattice. Journal of the China Society for Scientific and Technical Information, 37(9): 912-922
张海涛, 崔阳, 王丹, 宋拓. 2018. 基于概念格的在线健康社区用户画像研究. 情报学报, 37(9): 912-922[DOI:10.3772/j.issn.1000-0135.2018.09.006
相关文章
相关作者
相关机构