CFEVER: A Chinese Fact Extraction and VERification Dataset

Ying Jia Lin, Chun Yi Lin, Chia Jen Yeh, Yi Ting Li, Yun Yu Hu, Chih Hao Hsu, Mei Feng Lee, Hung Yu Kao

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

We present CFEVER, a Chinese dataset designed for Fact Extraction and VERification. CFEVER comprises 30,012 manually created claims based on content in Chinese Wikipedia. Each claim in CFEVER is labeled as “Supports”, “Refutes”, or “Not Enough Info” to depict its degree of factualness. Similar to the FEVER dataset, claims in the “Supports” and “Refutes” categories are also annotated with corresponding evidence sentences sourced from single or multiple pages in Chinese Wikipedia. Our labeled dataset holds a Fleiss’ kappa value of 0.7934 for five-way inter-annotator agreement. In addition, through the experiments with the state-of-the-art approaches developed on the FEVER dataset and a simple baseline for CFEVER, we demonstrate that our dataset is a new rigorous benchmark for factual extraction and verification, which can be further used for developing automated systems to alleviate human fact-checking efforts. CFEVER is available at https://ikmlab.github.io/CFEVER.

Original languageEnglish
Pages (from-to)18626-18634
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume38
Issue number17
DOIs
StatePublished - 25 03 2024
Externally publishedYes
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 02 202427 02 2024

Bibliographical note

Publisher Copyright:
© 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.

Fingerprint

Dive into the research topics of 'CFEVER: A Chinese Fact Extraction and VERification Dataset'. Together they form a unique fingerprint.

Cite this