Abstract
Latent Dirichlet allocation (LDA) has been successful for document modeling. LDA extracts the latent topics across documents. Words in a document are generated by the same topic distribution. However, in real-world documents, the usage of words in different paragraphs is varied and accompanied with different writing styles. This study extends the LDA and copes with the variations of topic information within a document. We build the nonstationary LDA (NLDA) by incorporating a Markov chain which is used to detect the stylistic segments in a document. Each segment corresponds to a particular style in composition of a document. This NLDA can exploit the topic information between documents as well as the word variations within a document. We accordingly establish a Viterbi-based variational Bayesian procedure. A language model adaptation scheme using NLDA is developed for speech recognition. Experimental results show improvement of NLDA over LDA in terms of perplexity and word error rate.
| Original language | English |
|---|---|
| Pages (from-to) | 372-375 |
| Number of pages | 4 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| State | Published - 09 2009 |
| Event | 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 - Brighton, United Kingdom Duration: 06 09 2009 → 10 09 2009 |
Keywords
- Latent Dirichlet allocation
- Nonstationary process
- Speech recognition
- Variational Bayesian