Latent Dirichlet learning for hierarchical segmentation

Jen Tzung Chien*, Chuang Hua Chueh

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Topic model can be established by using Dirichlet distributions as the prior model to characterize latent topics in natural language. However, topics in real-world stream data are non-stationary. Training a reliable topic model is a challenging study. Further, the usage of words in different paragraphs within a document is varied due to different composition styles. This study presents a hierarchical segmentation model by compensating the heterogeneous topics in stream level and the heterogeneous words in document level. The topic similarity between sentences is calculated to form a beta prior for stream-level segmentation. This segmentation prior is adopted to group topic-coherent sentences into a document. For each pseudo-document, we incorporate a Markov chain to detect stylistic segments within a document. The words in a segment are generated by identical composition style. This new model is inferred by a variational Bayesian EM procedure. Experimental results show benefits by using the proposed model in terms of perplexity and F measure.

Original languageEnglish
Title of host publication2012 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2012
DOIs
StatePublished - 09 2012
Externally publishedYes
Event2012 22nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2012 - Santander, Spain
Duration: 23 09 201226 09 2012

Publication series

NameIEEE International Workshop on Machine Learning for Signal Processing, MLSP
ISSN (Print)2161-0363
ISSN (Electronic)2161-0371

Conference

Conference2012 22nd IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2012
Country/TerritorySpain
CitySantander
Period23/09/1226/09/12

Keywords

  • Graphical Model
  • Hierarchical Segmentation
  • Machine Learning
  • Topic Model

Fingerprint

Dive into the research topics of 'Latent Dirichlet learning for hierarchical segmentation'. Together they form a unique fingerprint.

Cite this