How Do We Measure AI Literacy: Developing a Scale That Captures Both Knowledge and Confidence?
Building a dual-format AI literacy scale that measures both confidence and knowledge, uncovering a striking calibration gap between the two.
Overview
Measuring AI literacy sounds straightforward. Ask people how well they understand AI, and report the answers. The problem is that this approach assumes people know what they know. This project tests that assumption and finds it fails in a surprising direction.
I developed and psychometrically validated a dual-format AI literacy scale that pairs traditional self-assessment items with objective multiple-choice knowledge questions. Validated across two independent samples (n = 288 and n = 188), the scale revealed a consistent and striking pattern: the people who know the most about AI systematically underestimate their own competence, while those who know the least overestimate it.
Scale development required synthesizing three conflicting AI literacy frameworks into a coherent item pool, then systematically eliminating items through expert review and two pilot rounds. I developed all scale items, collected two independent validation samples (n = 288 and n = 188), and conducted both the exploratory and confirmatory factor analyses.
The Problem
Most AI literacy instruments rely exclusively on self-report: asking people how confident they feel about AI-related topics. The assumption is that self-assessed confidence tracks with actual knowledge. But confidence and competence are not the same thing, and treating them as equivalent produces systematically misleading data about who is and isn't AI literate.
The Measurement Gap
A person who scores 85% on a knowledge test may rate their own literacy as "moderate." A person who scores 30% may rate themselves as "quite knowledgeable." Single-format scales cannot see this gap, and the gap is exactly what matters for education and policy.
Research Questions
Can a dual-format scale combining self-assessment and factual knowledge items achieve strong psychometric properties across independent samples?
How does self-reported AI literacy relate to objectively measured knowledge, and what does the calibration gap look like across the population?
Research Process
Developing the Scale
The core design challenge was not writing questions — it was deciding what to measure and how to measure it in a way that would hold up psychometrically.
- Existing AI literacy scales measure either self-reported confidence or objective knowledge — not both in the same instrument
- Measuring only one misses the calibration gap: the relationship between what people think they know and what they actually know
- A dual-format instrument captures both constructs and makes their relationship visible as a measurable variable
- This required developing two internally consistent item sets that could be administered together without one contaminating the other
- Initial item pool synthesized three existing AI literacy frameworks — items overlapping or conflicting across frameworks were reconciled or removed
- Expert review identified items that were too jargon-heavy, double-barrelled, or testing recall rather than understanding
- Two pilot rounds eliminated items with poor inter-item correlations and refined ambiguous wording
- Knowledge test items were designed to resist guessing — requiring conceptual reasoning rather than terminology recognition
Scale Design
The scale was designed to capture AI literacy as two distinct but related constructs. Items were developed iteratively drawing on established AI literacy frameworks, then refined through expert review and pilot testing before validation.
Two-component structure: Format A captures subjective confidence; Format B captures objective knowledge.
Methods
Scale Development
Iterative item generation grounded in AI literacy frameworks, with expert review and pilot testing prior to validation studies.
Exploratory Factor Analysis
Applied to Sample 1 (n=288) to identify the underlying factor structure and remove poorly performing items.
Confirmatory Factor Analysis
Applied to Sample 2 (n=188) to verify the factor structure established in the exploratory sample.
Convergent & Discriminant Validity
Validated against established constructs including digital literacy and technology anxiety to confirm the scale measures what it claims to.
I thought I knew a lot about AI until I saw the knowledge questions. Now I'm not so sure. But I think I know more than most people around me still.Participant response during pilot testing
Key Insights
Strong psychometric properties across both samples
The dual-format scale demonstrated high internal consistency (Cronbach's α > 0.80 for both formats), a clear and replicable factor structure, and strong convergent validity, confirming it reliably measures what it claims to.
Confidence and knowledge are not the same construct
Factor analysis confirmed that self-assessed confidence and factual knowledge load onto distinct factors. They are related but separable, meaning a single-format scale that treats them as one thing is statistically misspecified.
The reverse Dunning-Kruger effect
Participants with the highest objective knowledge scores consistently underestimated their own competence. Those with the lowest scores overestimated it. Knowing more about AI appears to reveal how much more there is to know, producing more accurate but more humble self-assessments.
The gap widens with experience
The calibration gap was most pronounced among participants with prior AI education or industry experience. Deeper exposure to AI amplifies awareness of one's own knowledge boundaries, a finding with direct implications for how AI literacy education should be designed.
The Calibration Gap
Plotting self-assessed literacy against objective knowledge scores across the full sample reveals the pattern clearly: confidence tracks inversely with competence at the extremes, forming a characteristic reverse curve.
Reverse Dunning-Kruger: higher knowledge correlates with lower self-assessed competence. The gap between the two lines is the calibration error.
Design & Policy Implications
- AI literacy measurement requires dual-format instruments. Self-report alone overestimates literacy in low-knowledge groups and underestimates it in high-knowledge groups. Any research, program evaluation, or policy decision based on single-format data is working with a systematically distorted picture.
- Calibration is itself a measurable and targetable outcome. The gap between confidence and knowledge (not just the levels of each) is a diagnostic variable. Interventions should measure and address calibration explicitly, not just knowledge acquisition.
- AI education should cultivate epistemic humility alongside competence. The reverse Dunning-Kruger finding suggests deeper AI exposure produces more accurate but more humble self-awareness. Curricula should help learners build accurate models of what they do and do not yet understand, treating metacognitive accuracy as a core learning outcome.