Research programme — methodology & design
We observe how teachers actually use learnOS — and model that use, week by week, against UNESCO’s competency framework.
The central idea in one line: the first large-scale, real-world evidence on how AI competency develops in practice — anchored on observed behaviour, validated by triangulation, published openly.
What makes this credible
Six principles the whole design is built on.
01
Measure behaviour, not just opinion.
Most ‘AI in education’ claims rest on self-report. We anchor on what teachers actually do in learnOS, then use self-report to explain it — closing the well-documented gap between perceived and demonstrated competency.
02
The framework is the instrument.
We don’t invent a success metric. UNESCO’s 5×3 competency matrix is the measurement model. Every data point maps to a competency block.
03
The platform is the instrument.
Because Fellows work inside learnOS, the platform itself captures rich, structured evidence of practice — usage, artifacts, learning insights — that traditional teacher studies can’t access.
04
Low burden, high signal.
The weekly mechanism is usage-aware: questions are generated from what the teacher actually did, not a generic survey. Less recall bias, less fatigue, higher data quality.
05
Honest about limits.
We triangulate, state confounds plainly, and don’t overclaim student-outcome causation. Credibility comes from rigour and candour, not big numbers.
06
One instrument, two outputs.
The competency trajectory the research builds for each Fellow is exactly what earns their recognition. Research and credential are the same measurement.
The measurement model
UNESCO’s 5×3 competency matrix — the instrument.
Five dimensions × three progression levels = 15 competency blocks. Every Fellow has, in effect, a 15-cell scorecard. The research tracks movement across it over time.
The five dimensions
- 1
Human-centred mindset
Keeping human agency, judgement and responsibility central; AI as support, not substitute.
- 2
Ethics of AI
Privacy, data protection, bias, accountability, safe and responsible use.
- 3
AI foundations & applications
Understanding AI’s mechanisms and limits; selecting and customising tools appropriately.
- 4
AI pedagogy
Integrating AI into lesson design, instruction and assessment to improve teaching and learning.
- 5
AI for professional learning
Using AI for one’s own ongoing development and reflective practice.
The three progression levels
Acquire
Foundational — evaluate, select and use AI tools appropriately.
Deepen
Intermediate — design meaningful pedagogical strategies that integrate AI; critically assess its impact.
Create
Advanced — innovate teaching, design new approaches, lead and contribute to systemic change.
15 competency blocks. One scorecard per Fellow.
The same trajectory that earns Fellow recognition is the trajectory the research reports on. One instrument, two outputs.
Theory of change
The causal chain we’re testing — link by link.
We don’t just measure ‘did competency go up.’ We test the chain so we can say what works, why, and for whom.
- INPUT
learnOS Teacher Edition (seeded to national curriculum) + AI coaching.
- ACTIVITY
Teacher uses learnOS modules in real classrooms over 12–18 months.
- MECHANISM
Specific kinds of use develop specific UNESCO competency blocks.
- OUTPUT
Measurable progression across the 5×3 matrix (Acquire → Deepen → Create).
- OUTCOME
Changed practice: lighter workload, higher confidence, better differentiation and assessment.
- IMPACT
Improved teaching quality and — secondary, with caveats — student learning and engagement.
The interesting findings live in the MECHANISM row: which uses of which modules drive which competencies, for which teachers, in which contexts.
The measurement model
Three triangulated data streams.
We never rely on one source. Three streams, cross-checked against each other — a competency only scores as progressed when at least two of them agree.
Stream A
Behavioural / usage data
Objective · continuous
Captured from learnOS with consent: which modules a teacher uses, how often, and — crucially — how sophisticatedly. Depth of use is itself a competency signal.
Stream B
Reflective self-report
Weekly usage-aware check-in
Short, personalised, generated from that week’s actual usage. Captures the why, the experienced impact, and the reasoning behind choices — the things behaviour alone can’t show.
Stream C
Structured competency assessment
Baseline · mid · endline
Mapped to the 5×3 matrix, combining self-assessment with evidence-based items — scenario tasks and review of the teacher’s own learnOS artifacts — to validate self-report against demonstrated capability.
How learnOS use maps to the framework
The heart of the design.
learnOS’s real modules map cleanly onto the framework — and depth of use maps onto progression level.
AI foundations & applications
Using the AI hub; selecting/configuring tools; understanding what learnOS can and can’t do.
Acquire: uses defaults · Deepen: customises for purpose · Create: combines tools in novel ways.
AI pedagogy
OmniPrep (lesson design), AuraPractice (tiered practice & assessment), VisuTeach (resources), SparkClass (delivery & live insight).
Acquire: generates basic outputs · Deepen: designs differentiated, evidence-led pedagogy · Create: invents new instructional/assessment approaches.
Ethics of AI
The Pre-Grading assistant’s ~70/30 human–machine split; mandatory teacher confirmation before any AI output is used (‘AI drafts, teacher gatekeeps’); reviewing and correcting AI output; data-security awareness in DocuGuard.
Acquire: follows safe-use steps · Deepen: critically reviews & corrects AI · Create: shapes ethical rules of use.
Human-centred mindset
The teacher’s ‘final say’ on every output; the Pre-Grading philosophy — AI handles the groundwork, the teacher owns the deeper judgement and the human encouragement only a person can give; supply-side tiering without labelling students.
Acquire: aware of agency · Deepen: exercises judgement over AI · Create: champions human-centred practice to peers.
AI for professional learning
The accumulation loop (building a personal resource base); the weekly reflections themselves; progression coaching.
Acquire: reflects on own use · Deepen: systematically improves practice · Create: mentors peers, contributes knowledge.
AI pedagogy is the dense centre.
Most of learnOS’s teacher tools live here, so it carries the richest data — and it’s the dimension ministries care most about.
A reflexivity we name honestly.
The coaching and weekly reflection are themselves professional-learning practice — the programme is both intervention and measurement for that dimension. We flag this openly in the analysis.
The core mechanism
The weekly usage-aware reflective check-in.
Each week, the agent reviews the Fellow’s actual learnOS activity and generates a short, personalised set of reflective questions tied to what they actually did — not a generic survey. ~5–10 minutes, in their language, text or voice.
Worked example
A teacher used OmniPrep to build three lessons and AuraPractice once this week. The check-in might ask:
“You built three lessons with OmniPrep this week. Did you change much of what it produced — and why?”
→ AI pedagogy + Human-centred mindset
“You used AuraPractice once. Did the tiered practice change how your weaker students engaged?”
→ AI pedagogy (Deepen) + impact-on-learning signal
“Was there a moment you decided NOT to use the AI’s suggestion?”
→ Ethics of AI + Human-centred mindset
“How much time did this save you, honestly — and where did it cost you time?”
→ Workload impact (balanced, non-leading)
Why this is strong methodologically
- Reduces recall bias — asks about real, recent, specific actions.
- Reduces survey fatigue — feels like a relevant conversation, not a form.
- Adaptive — probes deeper as the teacher progresses; can target stalled cells.
- Doubles as coaching — the reflection itself develops the professional-learning and human-centred dimensions (noted reflexively in analysis).
- Neutral, non-leading framings (‘honestly’, ‘where did it cost you’) protect against demand characteristics.
A periodic deeper dive (e.g. monthly) runs a longer reflection or, for a sampled subset, a human/agent interview and a classroom-artifact review — feeding Stream C and the qualitative case library.
Assessment & recognition
From measurement to credential — without changing the instrument.
- 01
Baseline
Onboarding
Establish each Fellow’s starting position on the 5×3 matrix (self-assessment + scenario items). The counterfactual anchor — each Fellow is partly their own control (pre/post).
- 02
Continuous
Weekly + monthly
Usage data and reflections build an evolving picture between formal assessments.
- 03
Mid-point & endline
Structured assessment
Evidence-based assessment confirms progression with demonstrated capability, not just self-report.
- 04
Progression scoring
Triangulated
A competency block is marked progressed only on triangulated evidence (Streams A, B and C). Each Fellow’s trajectory across the 15 cells is their competency record.
- 05
Recognition loop
Same instrument
That competency record determines Fellow recognition. The teacher earns recognition by demonstrably progressing against the UNESCO framework — fair credential, clean research data.
Measuring impact
Three levels — with honesty about how confidently we can claim each.
Teacher-level
High confidence
Workload/time saved (self-reported + inferred), confidence and self-efficacy (validated short scales at baseline/mid/end), and change in practice — evidenced by learnOS artifacts over time.
Teaching-level
Medium confidence
Observable change in what teaching looks like: richer differentiation, more evidence-led assessment, higher-quality resources. The artifact trail in learnOS lets us assess change in practice directly, not only by self-report.
Learning-level
Secondary · exploratory · stated caveats
Student engagement and outcomes. Captured via teacher-reported engagement and, where schools consent, class-level learning insights as a proxy for mastery. Reported as indicative, not causal — attributing student outcomes at individual-teacher scale is not robustly possible without controlled conditions.
Over-claiming on student outcomes would discredit the whole study. Under-claiming honestly strengthens it. A future controlled sub-study at school-deployment scale is a natural Phase 2 research question.
Participant protocol
What we ask of teachers — designed to be light, honest, and fit around real teaching.
Participation is voluntary. Fellows may pause or withdraw at any time — exit is captured as data, never penalised.
- 01
Use learnOS Teacher Edition in real teaching at a target cadence (e.g. weekly).
- 02
Weekly reflective check-in — ~5–10 minutes, usage-aware, in their language.
- 03
Three formal assessments — baseline, mid-point, endline (~30–45 min each).
- 04
Consent to usage-data capture from learnOS (Stream A), with clear scope.
- 05
Optional / sampled extras — deeper interviews, classroom-artifact sharing, or written case studies (separate consent).
- 06
Total realistic burden: ~15–30 minutes/week beyond normal teaching, most of it the lightweight check-in.
The Living Research Platform
How the research runs itself — without marking its own homework.
An AI-led research engine sits behind the Fellowship. It decides what to ask each teacher, ingests their usage and answers, scores everything against the 15-cell matrix, and keeps a continuously-updating narrative current in real time. Humans own every truth claim that leaves the system.
01
Instrument Generation
Decides what to ask which teacher this week, generated from their actual learnOS activity and the framework cells most in need of evidence.
02
Ingestion
Pulls in Stream A usage, Stream B reflections and Stream C assessments — consented, secure, structured.
03
Coding & Mapping
Codes every input against the 5×3 matrix with inter-rater checks between AI coders and human reviewers.
04
Living Analysis
Re-runs progression, mechanism and segmentation analyses continuously as data arrives — not on a quarterly schedule.
05
Narrative Generation
Maintains a current internal narrative of what the evidence is saying, with provenance back to source data points.
06
Programme Steering
Flags stalled competencies, dormant Fellows and weak evidence cells back to the agent and the programme team.
Tier 1 — live intelligence
Internal, continuously updated.
A real-time view for the programme team and academic partner — used to steer the programme and spot where evidence is thin.
Tier 2 — published findings
Point-in-time, human-validated.
Nothing reaches the world without human review, pre-registered analysis plans and the partner’s sign-off. The AI runs the research; humans own every truth claim.
Rigour architecture
Eight safeguards that stop an AI-run study becoming a self-confirming machine.
The defining risk of an AI that generates questions, codes answers and writes findings is that it quietly confirms whatever it already expects. The validation layer is not optional — it is the product.
Pre-registration
Hypotheses, analysis plans and coding rubrics are registered before data is seen.
Human validation gates
No finding is published without a human reviewer signing off the evidence and the framing.
Inter-rater reliability
AI coding is regularly checked against trained human coders on a sampled basis; drift is corrected.
Hold-outs & replication
Cohorts and time windows are held back to test whether findings replicate, not just generate.
Adversarial analysis
A devil’s-advocate pass actively looks for the strongest alternative explanation before any claim is made.
Provenance & audit trail
Every claim traces back to the source data points, prompts and model versions that produced it.
Demand-characteristic controls
Question framings are neutralised against social desirability and leading prompts.
Transparency by default
Methods, limits and changes are published openly — the credibility comes from rigour and candour, not big numbers.
Study design, rigour & validity
Longitudinal, mixed-methods, cohort-based — with rolling/staggered cohorts.
Staggered cohorts act as a rough comparison for earlier ones — a stepped-wedge-style quasi-comparison, the closest we get to a counterfactual with volunteer teachers. Sampling is purposive and stratified across region, subject, school type, resource level and connectivity.
Triangulation rule
A competency is scored as progressed only when at least two streams agree — e.g. reported (B), evidenced in usage (A), and demonstrated in artifact or assessment (C).
Counterfactual honesty
No randomised control with volunteer teachers. We mitigate with pre/post within-Fellow change, baseline-as-control, staggered-cohort comparison, and benchmarking against the framework’s expected developmental sequence.
Evidence-based assessment
Self-assessment is paired with scenario tasks and review of the teacher’s own learnOS artifacts, so progression is validated against demonstrated capability — not opinion alone.
Stated limitations
Self-selection bias, reflexivity of measuring professional-learning while delivering it, consented self-report limits, and student-outcome attribution limits — all reported openly in every output.
Data, ethics & governance
Trust is not a feature. It’s a precondition.
Informed consent
Provided in each participant’s language, before any data is collected. Clear, withdrawable at any time.
Transparent data use
Clear statement of what is collected, how it’s used, how long it’s kept, and Fellows’ rights over it. Attribution-choice for quotes and cases.
Independent oversight
Ethics oversight is a named role, not an afterthought — confirmed before the pilot cohort opens.
Secure, school-based handling
Data minimisation; secure, school-based handling; no identifiable student data beyond protocol and consent; research store separate from identity where possible.
Analysis
The full work-up.
The headline intellectual output is the mechanism analysis — the evidenced links between specific learnOS use and specific competency development, with the contextual factors that enable or block it.
Per-Fellow
A competency trajectory across all 15 cells over time; an impact profile (workload, confidence, practice change); a qualitative narrative.
Cohort / aggregate
Progression distributions per competency block; which modules and uses drive which competencies; time-to-progression; retention and its drivers.
Segmented
By region, subject, resource level, connectivity, baseline level — what works for whom, where, and under what conditions.
Mechanism analysis
The evidenced links between specific learnOS use and specific competency development, with enabling and blocking contextual factors.
Research outputs
Open. Cited. Available to the field.
All published openly, with methods and limitations, on a cadence that doubles as programme momentum.
- 01
Baseline report
Early in the programme
The starting state and the design — landscape view of where Fellows begin across regions.
- 02
Interim impact assessment
Mid-programme
First evidence of competency progression and emerging mechanisms — which uses of which modules are driving which competencies.
- 03
Flagship impact study & white paper
At conclusion
The full evidence base and findings — the largest real-world study of AI-competency development in under-resourced classrooms.
- 04
Regional landscape analyses
Throughout
Per-region cuts: what works for whom, where, and under what conditions.
- 05
Per-Fellow competency records
Continuous
Each Fellow’s evidenced trajectory across the 15-cell matrix — the basis of their recognition.
Research partners
Become a co-author of the world’s evidence base.
Apply to become a Fellow and contribute to the largest real-world study of AI in teaching.
