arxiv:2403.05518

Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought

Published on Mar 8, 2024

Authors:

Abstract

Bias-augmented consistency training (BCT) reduces biased reasoning in language models across various tasks and biases without requiring gold labels.

AI-generated summary

While chain-of-thought prompting (CoT) has the potential to improve the explainability of language model reasoning, it can systematically misrepresent the factors influencing models' behavior--for example, rationalizing answers in line with a user's opinion without mentioning this bias. To mitigate this biased reasoning problem, we introduce bias-augmented consistency training (BCT), an unsupervised fine-tuning scheme that trains models to give consistent reasoning across prompts with and without biasing features. We construct a suite testing nine forms of biased reasoning on seven question-answering tasks, and find that applying BCT to GPT-3.5-Turbo with one bias reduces the rate of biased reasoning by 86% on held-out tasks. Moreover, this model generalizes to other forms of bias, reducing biased reasoning on held-out biases by an average of 37%. As BCT generalizes to held-out biases and does not require gold labels, this method may hold promise for reducing biased reasoning from as-of-yet unknown biases and on tasks where supervision for ground truth reasoning is unavailable.

View arXiv page View PDF GitHub 15 auto Add to collection

Community

thejaminator

Paper author Oct 21, 2024

This comment has been hidden

thejaminator

Paper author Oct 21, 2024

An introspective LLM could tell us about itself — including beliefs, concepts & goals— by directly examining its inner states, rather than simply reproducing information in its training data.
So can LLMs introspect?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2403.05518

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.05518 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.05518 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.05518 in a Space README.md to link it from this page.