Does Explanation Presentation Format Affect How Users Assess and Understand AI?

Comparing narrative vs. Q&A formats to understand how presentation style affects users' assessment and understanding of AI systems.

Participants

Conditions

VL/HCC ’24

Published At

← Back to Projects

Overview

Prior work showed that training data explanations can shift user trust. But what happens when the same information is packaged differently? Does format matter? And if so, how?

This study examined whether the presentation style of training dataset explanations shapes the quality of user critiques about AI systems, including how accurately they reason about bias, how deeply they engage with the information, and how they feel about the experience. By holding information content constant and varying only format, I isolated the design variable that practitioners most directly control.

My Role

The central design challenge here was creating two genuinely equivalent explanation formats — same content, different interaction model — without one feeling more complete than the other. I designed both conditions, built the study materials, recruited and ran all 39 participants, independently coded critique transcripts alongside a second coder for reliability, and ran the statistical analysis.

The Problem

Most XAI research focuses on what to explain. Far less attention has been paid to how that explanation is delivered. Yet interface designers make format decisions constantly, and those decisions may have real consequences for whether users can actually do something useful with the information they receive.

Design Question

If two users receive exactly the same training data information: one as a narrative and one as an interactive Q&A. Do they end up equally equipped to critique the AI system? This study says no.

Research Questions

RQ 1

How do Data Story and Q&A formats differ in the types and quality of critique users produce about an AI system?

RQ 2

Does presentation format affect users' ability to identify specific biases present in the training data?

RQ 3

How do users' subjective impressions of the explanation differ between formats?

Research Process

Literature Review

Condition Design

Pilot Testing

Between-Subjects Study

Content Coding

Analysis

Designing the Conditions

The study only works if both formats are genuinely equivalent in content. Getting that right was the hardest design problem in the project.

Why narrative vs. Q&A

These two formats represent fundamentally different models of how information reaches a user: pushed linearly vs. pulled on demand
Narrative mirrors how journalists convey complex information — contextual, sequential, inferential
Q&A mirrors how people increasingly query AI systems — targeted, user-driven, non-linear
Comparing them isolates the interaction model as the independent variable while holding content constant

Key design constraints

Both formats had to contain identical information — no content advantages for either condition
Reading level and approximate word count were matched across conditions
The Q&A interface had to feel complete, not like an incomplete narrative
Multiple pilot rounds were used to verify that neither format felt more thorough than the other before the main study

Study Design

A between-subjects experiment assigned participants to one of two conditions. Both conditions presented identical information about the same AI hiring tool's training dataset. Only the format differed.

Condition A: Data Story

Training data information delivered as a cohesive written narrative

Linear reading experience; information flows in fixed sequence

Contextual framing woven throughout the description

Encourages holistic, system-level reading

Condition B: Q&A Interface

Participants query specific aspects of training data on demand

Non-linear, user-driven navigation through information

Information only revealed when explicitly requested

Encourages targeted, question-driven engagement

Same training data content. Different interaction model. Different outcomes.

After reviewing their assigned explanation, participants wrote open-ended critiques of the AI system. Critiques were independently coded for accuracy, breadth of issues identified, and depth of reasoning. Post-task questionnaires captured perceptions of the explanation and the AI system.

Methods

🧪

Between-Subjects Experiment

Participants randomly assigned to one condition, eliminating carry-over effects between format exposures.

📝

Open-Ended Critique Task

Participants wrote free-form critiques of the AI system after reviewing their explanation. Critiques were the primary outcome measure.

🔍

Content Coding

Two independent coders analyzed critique transcripts for accuracy, issue breadth, bias identification, and reasoning depth.

📊

Post-Task Questionnaires

Measured perceived control, explanation satisfaction, and subjective impressions of the AI system across conditions.

Reading it as a story made me think about the whole picture, not just the one thing I was worried about. I kept connecting things.

Participant in the Data Story condition

Key Insights

Format is not neutral

Presentation style systematically shaped the structure and content of user critiques , even when the underlying information was held constant. How you say it changes what people do with it.

Narrative drives holistic reasoning

Data Story participants produced critiques that connected multiple issues and reasoned about the AI system as a whole. The narrative framing encouraged inferential thinking beyond what was explicitly stated.

Q&A sharpens precision, narrows scope

Q&A participants produced more targeted, fact-specific critiques and reported higher perceived control. But the on-demand format meant users only investigated what they already suspected, missing issues they did not know to ask about.

Neither format is universally better

The formats involve genuine tradeoffs between breadth of reasoning and user-driven focus. The right format depends on the goal: broad accountability auditing favors narrative; targeted investigation favors Q&A.

Design Implications

Impact The finding that format shapes reasoning quality — independent of information content — directly motivated the IUI 2026 study, which investigated the mechanism behind this effect. Together the two studies form a coherent design research arc from format to depth.

Match format to the evaluation goal. Narrative formats support broad, exploratory accountability reasoning. Q&A interfaces are better suited when users have specific, targeted concerns to investigate. Designers should treat these as distinct use cases, not interchangeable options.
Narrative framing enables inference. A linear story supports users in drawing connections and reaching conclusions that transcend what was explicitly stated, a valuable property for bias detection that isolated Q&A interactions do not replicate.
Consider hybrid designs. The observed tradeoffs suggest an ideal explanation interface might combine narrative entry with on-demand depth, providing a coherent foundation while preserving user agency over how far to explore.

← Previous: Does Transparency About Training Data Change How Users Trust AI? Next Project: How Much Detail Should a Training Data Explanation Include? →