How Do We Measure AI Literacy: Developing a Scale That Captures Both Knowledge and Confidence?

Building a dual-format AI literacy scale that measures both confidence and knowledge, uncovering a striking calibration gap between the two.

476

Participants

Validation Samples

Under Review

Status

← Back to Projects

Overview

Measuring AI literacy sounds straightforward. Ask people how well they understand AI, and report the answers. The problem is that this approach assumes people know what they know. This project tests that assumption and finds it fails in a surprising direction.

I developed and psychometrically validated a dual-format AI literacy scale that pairs traditional self-assessment items with objective multiple-choice knowledge questions. Validated across two independent samples (n = 288 and n = 188), the scale revealed a consistent and striking pattern: the people who know the most about AI systematically underestimate their own competence, while those who know the least overestimate it.

My Role

Scale development required synthesizing three conflicting AI literacy frameworks into a coherent item pool, then systematically eliminating items through expert review and two pilot rounds. I developed all scale items, collected two independent validation samples (n = 288 and n = 188), and conducted both the exploratory and confirmatory factor analyses.

The Problem

Most AI literacy instruments rely exclusively on self-report: asking people how confident they feel about AI-related topics. The assumption is that self-assessed confidence tracks with actual knowledge. But confidence and competence are not the same thing, and treating them as equivalent produces systematically misleading data about who is and isn't AI literate.

The Measurement Gap

A person who scores 85% on a knowledge test may rate their own literacy as "moderate." A person who scores 30% may rate themselves as "quite knowledgeable." Single-format scales cannot see this gap, and the gap is exactly what matters for education and policy.

Research Questions

RQ 1

Can a dual-format scale combining self-assessment and factual knowledge items achieve strong psychometric properties across independent samples?

RQ 2

How does self-reported AI literacy relate to objectively measured knowledge, and what does the calibration gap look like across the population?

Research Process

Framework Review

Item Development

Expert Review

Pilot Study

EFA (n=288)

CFA (n=188)

Developing the Scale

The core design challenge was not writing questions — it was deciding what to measure and how to measure it in a way that would hold up psychometrically.

Why dual-format

Existing AI literacy scales measure either self-reported confidence or objective knowledge — not both in the same instrument
Measuring only one misses the calibration gap: the relationship between what people think they know and what they actually know
A dual-format instrument captures both constructs and makes their relationship visible as a measurable variable
This required developing two internally consistent item sets that could be administered together without one contaminating the other

Item development process

Initial item pool synthesized three existing AI literacy frameworks — items overlapping or conflicting across frameworks were reconciled or removed
Expert review identified items that were too jargon-heavy, double-barrelled, or testing recall rather than understanding
Two pilot rounds eliminated items with poor inter-item correlations and refined ambiguous wording
Knowledge test items were designed to resist guessing — requiring conceptual reasoning rather than terminology recognition

Scale Design

The scale was designed to capture AI literacy as two distinct but related constructs. Items were developed iteratively drawing on established AI literacy frameworks, then refined through expert review and pilot testing before validation.

Format A: Self-Assessment

📋 Likert-scale items (5-point agreement)

💬 Measures perceived competence, confidence, and familiarity with AI

🎯 Covers AI concepts, capabilities, limitations, and societal impact

⚡ Captures how people feel about their AI knowledge

Format B: Knowledge Test

✅ Multiple-choice factual questions with one correct answer

🧠 Tests objective understanding of how AI systems work

🎯 Covers data, training, algorithmic decision-making, limitations

📐 Captures what people actually know about AI

Two-component structure: Format A captures subjective confidence; Format B captures objective knowledge.

Methods

📐

Scale Development

Iterative item generation grounded in AI literacy frameworks, with expert review and pilot testing prior to validation studies.

🔬

Exploratory Factor Analysis

Applied to Sample 1 (n=288) to identify the underlying factor structure and remove poorly performing items.

✅

Confirmatory Factor Analysis

Applied to Sample 2 (n=188) to verify the factor structure established in the exploratory sample.

📊

Convergent & Discriminant Validity

Validated against established constructs including digital literacy and technology anxiety to confirm the scale measures what it claims to.

I thought I knew a lot about AI until I saw the knowledge questions. Now I'm not so sure. But I think I know more than most people around me still.

Participant response during pilot testing

Key Insights

Strong psychometric properties across both samples

The dual-format scale demonstrated high internal consistency (Cronbach's α > 0.80 for both formats), a clear and replicable factor structure, and strong convergent validity, confirming it reliably measures what it claims to.

Confidence and knowledge are not the same construct

Factor analysis confirmed that self-assessed confidence and factual knowledge load onto distinct factors. They are related but separable, meaning a single-format scale that treats them as one thing is statistically misspecified.

The reverse Dunning-Kruger effect

Participants with the highest objective knowledge scores consistently underestimated their own competence. Those with the lowest scores overestimated it. Knowing more about AI appears to reveal how much more there is to know, producing more accurate but more humble self-assessments.

The gap widens with experience

The calibration gap was most pronounced among participants with prior AI education or industry experience. Deeper exposure to AI amplifies awareness of one's own knowledge boundaries, a finding with direct implications for how AI literacy education should be designed.

The Calibration Gap

Plotting self-assessed literacy against objective knowledge scores across the full sample reveals the pattern clearly: confidence tracks inversely with competence at the extremes, forming a characteristic reverse curve.

Reverse Dunning-Kruger: higher knowledge correlates with lower self-assessed competence. The gap between the two lines is the calibration error.

Design & Policy Implications

Impact The scale addresses a measurement gap that has limited AI literacy research. It is available to researchers and educators on request, and the calibration gap finding has implications for how AI literacy curricula should be evaluated — not just whether knowledge increases, but whether self-assessment accuracy improves.

AI literacy measurement requires dual-format instruments. Self-report alone overestimates literacy in low-knowledge groups and underestimates it in high-knowledge groups. Any research, program evaluation, or policy decision based on single-format data is working with a systematically distorted picture.
Calibration is itself a measurable and targetable outcome. The gap between confidence and knowledge (not just the levels of each) is a diagnostic variable. Interventions should measure and address calibration explicitly, not just knowledge acquisition.
AI education should cultivate epistemic humility alongside competence. The reverse Dunning-Kruger finding suggests deeper AI exposure produces more accurate but more humble self-awareness. Curricula should help learners build accurate models of what they do and do not yet understand, treating metacognitive accuracy as a core learning outcome.

← Previous: How Much Detail Should a Training Data Explanation Include?