Does Explanation Presentation Format Affect How Users Assess and Understand AI?
Comparing narrative vs. Q&A formats to understand how presentation style affects users' assessment and understanding of AI systems.
Overview
Prior work showed that training data explanations can shift user trust. But what happens when the same information is packaged differently? Does format matter? And if so, how?
This study examined whether the presentation style of training dataset explanations shapes the quality of user critiques about AI systems, including how accurately they reason about bias, how deeply they engage with the information, and how they feel about the experience. By holding information content constant and varying only format, I isolated the design variable that practitioners most directly control.
The central design challenge here was creating two genuinely equivalent explanation formats — same content, different interaction model — without one feeling more complete than the other. I designed both conditions, built the study materials, recruited and ran all 39 participants, independently coded critique transcripts alongside a second coder for reliability, and ran the statistical analysis.
The Problem
Most XAI research focuses on what to explain. Far less attention has been paid to how that explanation is delivered. Yet interface designers make format decisions constantly, and those decisions may have real consequences for whether users can actually do something useful with the information they receive.
Design Question
If two users receive exactly the same training data information: one as a narrative and one as an interactive Q&A. Do they end up equally equipped to critique the AI system? This study says no.
Research Questions
How do Data Story and Q&A formats differ in the types and quality of critique users produce about an AI system?
Does presentation format affect users' ability to identify specific biases present in the training data?
How do users' subjective impressions of the explanation differ between formats?
Research Process
Designing the Conditions
The study only works if both formats are genuinely equivalent in content. Getting that right was the hardest design problem in the project.
- These two formats represent fundamentally different models of how information reaches a user: pushed linearly vs. pulled on demand
- Narrative mirrors how journalists convey complex information — contextual, sequential, inferential
- Q&A mirrors how people increasingly query AI systems — targeted, user-driven, non-linear
- Comparing them isolates the interaction model as the independent variable while holding content constant
- Both formats had to contain identical information — no content advantages for either condition
- Reading level and approximate word count were matched across conditions
- The Q&A interface had to feel complete, not like an incomplete narrative
- Multiple pilot rounds were used to verify that neither format felt more thorough than the other before the main study
Study Design
A between-subjects experiment assigned participants to one of two conditions. Both conditions presented identical information about the same AI hiring tool's training dataset. Only the format differed.
Same training data content. Different interaction model. Different outcomes.
After reviewing their assigned explanation, participants wrote open-ended critiques of the AI system. Critiques were independently coded for accuracy, breadth of issues identified, and depth of reasoning. Post-task questionnaires captured perceptions of the explanation and the AI system.
Methods
Between-Subjects Experiment
Participants randomly assigned to one condition, eliminating carry-over effects between format exposures.
Open-Ended Critique Task
Participants wrote free-form critiques of the AI system after reviewing their explanation. Critiques were the primary outcome measure.
Content Coding
Two independent coders analyzed critique transcripts for accuracy, issue breadth, bias identification, and reasoning depth.
Post-Task Questionnaires
Measured perceived control, explanation satisfaction, and subjective impressions of the AI system across conditions.
Reading it as a story made me think about the whole picture, not just the one thing I was worried about. I kept connecting things.Participant in the Data Story condition
Key Insights
Format is not neutral
Presentation style systematically shaped the structure and content of user critiques , even when the underlying information was held constant. How you say it changes what people do with it.
Narrative drives holistic reasoning
Data Story participants produced critiques that connected multiple issues and reasoned about the AI system as a whole. The narrative framing encouraged inferential thinking beyond what was explicitly stated.
Q&A sharpens precision, narrows scope
Q&A participants produced more targeted, fact-specific critiques and reported higher perceived control. But the on-demand format meant users only investigated what they already suspected, missing issues they did not know to ask about.
Neither format is universally better
The formats involve genuine tradeoffs between breadth of reasoning and user-driven focus. The right format depends on the goal: broad accountability auditing favors narrative; targeted investigation favors Q&A.
Design Implications
- Match format to the evaluation goal. Narrative formats support broad, exploratory accountability reasoning. Q&A interfaces are better suited when users have specific, targeted concerns to investigate. Designers should treat these as distinct use cases, not interchangeable options.
- Narrative framing enables inference. A linear story supports users in drawing connections and reaching conclusions that transcend what was explicitly stated, a valuable property for bias detection that isolated Q&A interactions do not replicate.
- Consider hybrid designs. The observed tradeoffs suggest an ideal explanation interface might combine narrative entry with on-demand depth, providing a coherent foundation while preserving user agency over how far to explore.