arxiv:2604.02315

Beyond the Assistant Turn: User Turn Generation as a Probe of Interaction Awareness in Language Models

Published on Apr 3

· Submitted by

Sarath Shekkizhar on Apr 13

Salesforce AI Research

Upvote

Authors:

Abstract

User-turn generation serves as a probe to measure interaction awareness in large language models, revealing that this capability is distinct from task accuracy and can be influenced by training methods.

AI-generated summary

Standard LLM benchmarks evaluate the assistant turn: the model generates a response to an input, a verifier scores correctness, and the analysis ends. This paradigm leaves unmeasured whether the LLM encodes any awareness of what follows the assistant response. We propose user-turn generation as a probe of this gap: given a conversation context of user query and assistant response, we let a model generate under the user role. If the model's weights encode interaction awareness, the generated user turn will be a grounded follow-up that reacts to the preceding context. Through experiments across 11 open-weight LLMs (Qwen3.5, gpt-oss, GLM) and 5 datasets (math reasoning, instruction following, conversation), we show that interaction awareness is decoupled from task accuracy. In particular, within the Qwen3.5 family, GSM8K accuracy scales from 41% (0.8B) to 96.8% (397B-A17B), yet genuine follow-up rates under deterministic generation remain near zero. In contrast, higher temperature sampling reveals interaction awareness is latent with follow up rates reaching 22%. Controlled perturbations validate that the proposed probe measures a real property of the model, and collaboration-oriented post-training on Qwen3.5-2B demonstrates an increase in follow-up rates. Our results show that user-turn generation captures a dimension of LLM behavior, interaction awareness, that is unexplored and invisible with current assistant-only benchmarks.

View arXiv page View PDF Add to collection

Community

shekkizh

Paper submitter about 16 hours ago

"Beyond the Assistant Turn" probes latent interaction awareness to measure collaboration capability without additional scaffolding. We found task accuracy is decoupled from "understanding the other side", even top-tier models suffer identity drift, for e.g., during agent negotiation. By using User Turn generation, we provide a way to score and surface the partner modeling capabilities in LLMs.

librarian-bot

about 13 hours ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2604.02315

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.02315 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.02315 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.02315 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.