ChronoQA

ChronoQA is a passage-grounded benchmark that tests whether retrieval-augmented generation (RAG) systems can keep temporal and causal facts straight when reading long-form narratives (novels, scripts, etc.).
Instead of giving the entire book to the model, ChronoQA forces a RAG pipeline to retrieve the right snippets and reason about evolving characters and event sequences.

Instances 1,028 question–answer pairs
Narratives 18 public-domain stories
Reasoning facets 8 (causal, character, setting, …)
Evidence Exact byte-offsets for each answer
Language English
Intended use Evaluate/train RAG systems that need chronology & causality
License (annotations) CC-BY-NC-SA-4.0

Dataset Description

Motivation

Standard RAG pipelines often lose chronological order and collapse every mention of an entity into a single node. ChronoQA highlights the failures that follow. Example:

"Who was jinxing Harry's broom during his first Quidditch match?" – a system that only retrieves early chapters may wrongly answer Snape instead of Quirrell.

Source Stories

All texts come from Project Gutenberg (public domain in the US).

ID Title # Q
1 A Study in Scarlet 67
2 The Hound of the Baskervilles 55
3 Harry Potter and the Chamber of Secrets 30
4 Harry Potter and the Sorcerer's Stone 25
5 Les Misérables 72
6 The Phantom of the Opera 70
7 The Sign of the Four 62
8 The Wonderful Wizard of Oz 82
9 The Adventures of Sherlock Holmes 34
10 Lady Susan 88
11 Dangerous Connections 111
12 The Picture of Dorian Gray 27
13 The Diary of a Nobody 39
14 The Sorrows of Young Werther 58
15 The Mysterious Affair at Styles 69
16 Pride and Prejudice 54
17 The Secret Garden 61
18 Anne of Green Gables 24

Reasoning Facets

  1. Causal Consistency
  2. Character & Behavioural Consistency
  3. Setting, Environment & Atmosphere
  4. Symbolism, Imagery & Motifs
  5. Thematic, Philosophical & Moral
  6. Narrative & Plot Structure
  7. Social, Cultural & Political
  8. Emotional & Psychological

Dataset Structure

Field Type Description
story_id string ID of the narrative
question_id int32 QA index within that story
category string One of the 8 reasoning facets
query string Natural-language question
ground_truth string Gold answer
passages sequence of objects Each object contains:
start_sentence string
end_sentence string
start_byte int32
end_byte int32
excerpt string
story_title* string Human-readable title (optional, present in processed splits)

*The raw JSONL released with the paper does not include story_title; it is added automatically in the hosted HF dataset for convenience.

There is a single all split (1,028 rows). Create your own train/validation/test splits if needed (e.g. by story or by reasoning facet).


Usage Example

from datasets import load_dataset

ds = load_dataset("your-org/chronoqa", split="all")
example = ds[0]

print("Question:", example["query"])
print("Answer  :", example["ground_truth"])
print("Evidence:", example["passages"][0]["excerpt"][:300], "…")

Citation Information

@article{zhang2025respecting,
  title={Respecting Temporal-Causal Consistency: Entity-Event Knowledge Graphs for Retrieval-Augmented Generation},
  author={Zhang, Ze Yu and Li, Zitao and Li, Yaliang and Ding, Bolin and Low, Bryan Kian Hsiang},
  journal={arXiv preprint arXiv:2506.05939},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zy113/ChronoQA