Isaac Scientific Publishing

Frontiers in Signal Processing

Research on Tibetan Document Segmentation Based on GMM+K-means Algorithm

Download PDF (3174.2 KB) PP. 100 - 108 Pub. Date: October 15, 2019

DOI: 10.22606/fsp.2019.34006

Author(s)

  • Chengliang Jiang
    College of Electrical & Information Engineering, Southwest Minzu University, Chengdu, China
  • Huazhang Wang*
    College of Electrical & Information Engineering, Southwest Minzu University, Chengdu, China

Abstract

The Tibetan language, as the language of the tubo period, recorded the life, history and other important events of the Tibetan people, and is a treasure of Tibetan culture. Aiming at the problem of the loss of Tibetan documents caused by the yellowing, blackening and rotting of papers due to the old age, a new method for the segmentation of Tibetan documents is proposed. In order to better protect Tibetan documents and reveal the contents recorded in the documents. The method uses the improved NLM (non-local means) algorithm to de-dry the pre-processing, and uses the automatic region-blocking GMM (Gaussian Mixture Model)+K-means multi-feature fusion algorithm to segment, using multi-region classification extraction as post-processing of the Tibetan documents. Experimental results show that compared with k-means, GMM and other algorithms, this method can more effectively segment the text in Tibetan literature, proving the effectiveness and accuracy of this method.

Keywords

Tibetan documents, NLM, GMM, K-means

References

[1] Song R, Zhang Z, Liu H. “Edge connection based Canny edge detection algorithm”. Pattern Recognition & Image Analysis, 2017, 27(4), pp.740-747.

[2] GAO Yong-gang. “An Improved Edge Detection of Roberts Operators”. Journal of Chaohu University, 2009, 11(6), pp.31-32.

[3] Ding K, Xiao L, Weng G. “Active contours driven by region-scalable fitting and optimized Laplacian of Gaussian energy for image segmentation”. Signal Processing, 2017, 134, pp.224-233.

[4] YILIHAMU Yaermaimaiti, “Research on an improved image segmentation algorithm for Uyghur characters”, Modern Electronics Technique, pp.128-131, 2017(04).

[5] Zhang Kaige, Miao Yi, Lei Jiankun, et al. “Extraction of color image texts combining wavelet interpolation and K-means”. Computer Technology and Development, pp.31-33. 2013(3).

[6] Wu Suhui, Cheng Ying, Zheng Yanning, et al. “Survey on K-means Algorithm”, Data Analysis and Knowledge Discovery, pp.28-35,2011, 27(5)

[7] Xiang Rihua, Wang Runsheng. “A Range Image Segmentation Algorithm Based on Gaussian Mixture Model”. Journal of Software, 14(7), pp.1250-1257. 2003.

[8] Luisier F, Blu T. “SURE-LET multichannel image denoising: interscale orthonomal wavelet thresholding “.IEEE Transactions on Image Processing, 2008, 17(4). pp. 482-492.

[9] Donoho D L, Johnstone J M. Ideal spatial adaptation by wavelet shrinkage[J]. Biometrika, 1994, 81(3), pp.425- 455.

[10] Bai J, Feng X C. “Fractional order anisotropic diffusion for image denoising”. IEEE Transactions on Image Pmcessing, 2007, 16(10), pp.2492-2502.

[11] Perona P, Malik J. “scale-space and edge detection using all isotropic diffusion”. IEEE Transactions on Pattem Analysis and Machines Intelligence, 1990, 12(7), pp.629-639.

[12] Buades A, Coll B, Morel J M. “A Non-Local Algorithm for Image Denoising”. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA:IEEE, 2005, pp60-65.

[13] Zhao Qingping, Chen Debao, Jiang Enhua, et al. “Improved weighted non-local mean algorithm filter for image denoising”, Journal of Electronic Measurement and Instrument, 2014, 28(3), pp.334-339.