Abstract
Unsupervised Chinese word segmentation (UCWS) has made progress by incorporating linguistic knowledge from pre-trained language models using parameter-free probing techniques. However, such approaches suffer from increased training time due to the need for multiple inferences using a pre-trained language model to perform word segmentation. This work introduces a novel way to enhance UCWS performance while maintaining training efficiency. Our proposed method integrates the segmentation signal from the unsupervised segmental language model to the pre-trained BERT classifier under a pseudo-labeling framework. Experimental results demonstrate that our approach achieves state-of-the-art performance on the seven out of eight UCWS tasks while considerably reducing the training time compared to previous approaches.
| Original language | English |
|---|---|
| Title of host publication | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
| Editors | Houda Bouamor, Juan Pino, Kalika Bali |
| Publisher | Association for Computational Linguistics (ACL) |
| Pages | 9109-9118 |
| Number of pages | 10 |
| ISBN (Electronic) | 9798891760608 |
| DOIs | |
| State | Published - 2023 |
| Externally published | Yes |
| Event | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 - Hybrid, Singapore, Singapore Duration: 06 12 2023 → 10 12 2023 |
Publication series
| Name | EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings |
|---|
Conference
| Conference | 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023 |
|---|---|
| Country/Territory | Singapore |
| City | Hybrid, Singapore |
| Period | 06/12/23 → 10/12/23 |
Bibliographical note
Publisher Copyright:© 2023 Association for Computational Linguistics.