ChronoQA

ChronoQA is a passage-grounded benchmark that tests whether retrieval-augmented generation (RAG) systems can keep temporal and causal facts straight when reading long-form narratives (novels, scripts, etc.).
Instead of giving the entire book to the model, ChronoQA forces a RAG pipeline to retrieve the right snippets and reason about evolving characters and event sequences.


Instances	1,028 question–answer pairs
Narratives	18 public-domain stories
Reasoning facets	8 (causal, character, setting, …)
Evidence	Exact byte-offsets for each answer
Language	English
Intended use	Evaluate/train RAG systems that need chronology & causality
License (annotations)	CC-BY-NC-SA-4.0

Dataset Description

Motivation

Standard RAG pipelines often lose chronological order and collapse every mention of an entity into a single node. ChronoQA highlights the failures that follow. Example:

"Who was jinxing Harry's broom during his first Quidditch match?" – a system that only retrieves early chapters may wrongly answer Snape instead of Quirrell.

Source Stories

All texts come from Project Gutenberg (public domain in the US).

ID	Title	# Q
1	A Study in Scarlet	67
2	The Hound of the Baskervilles	55
3	Harry Potter and the Chamber of Secrets	30
4	Harry Potter and the Sorcerer's Stone	25
5	Les Misérables	72
6	The Phantom of the Opera	70
7	The Sign of the Four	62
8	The Wonderful Wizard of Oz	82
9	The Adventures of Sherlock Holmes	34
10	Lady Susan	88
11	Dangerous Connections	111
12	The Picture of Dorian Gray	27
13	The Diary of a Nobody	39
14	The Sorrows of Young Werther	58
15	The Mysterious Affair at Styles	69
16	Pride and Prejudice	54
17	The Secret Garden	61
18	Anne of Green Gables	24

Reasoning Facets

Causal Consistency
Character & Behavioural Consistency
Setting, Environment & Atmosphere
Symbolism, Imagery & Motifs
Thematic, Philosophical & Moral
Narrative & Plot Structure
Social, Cultural & Political
Emotional & Psychological

Dataset Structure

Field	Type	Description
`story_id`	`string`	ID of the narrative
`question_id`	`int32`	QA index within that story
`category`	`string`	One of the 8 reasoning facets
`query`	`string`	Natural-language question
`ground_truth`	`string`	Gold answer
`passages`	`sequence` of objects	Each object contains: • `start_sentence` `string` • `end_sentence` `string` • `start_byte` `int32` • `end_byte` `int32` • `excerpt` `string`
`story_title`*	`string`	Human-readable title (optional, present in processed splits)

*The raw JSONL released with the paper does not include story_title; it is added automatically in the hosted HF dataset for convenience.

There is a single all split (1,028 rows). Create your own train/validation/test splits if needed (e.g. by story or by reasoning facet).

Usage Example

from datasets import load_dataset

ds = load_dataset("your-org/chronoqa", split="all")
example = ds[0]

print("Question:", example["query"])
print("Answer  :", example["ground_truth"])
print("Evidence:", example["passages"][0]["excerpt"][:300], "…")

Citation Information

@article{zhang2025respecting,
  title={Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation},
  author={Zhang, Ze Yu and Li, Zitao and Li, Yaliang and Ding, Bolin and Low, Bryan Kian Hsiang},
  journal={arXiv preprint arXiv:2506.05939},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zy113/ChronoQA

Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation

Paper • 2506.05939 • Published Jun 6, 2025