Research programme — methodology & design

We observe how teachers actually use learnOS — and model that use, week by week, against UNESCO's competency framework.

The central idea in one line: the first large-scale, real-world evidence on how AI competency develops in practice — anchored on observed behaviour, validated by triangulation, published openly.

What makes this credible

Six principles the whole design is built on.

Measure behaviour, not just opinion.

Most 'AI in education' claims rest on self-report. We anchor on what teachers actually do in learnOS, then use self-report to explain it — closing the well-documented gap between perceived and demonstrated competency.

The framework is the instrument.

We don't invent a success metric. UNESCO's 5×3 competency matrix is the measurement model. Every data point maps to a competency block.

The platform is the instrument.

Because Fellows work inside learnOS, the platform itself captures rich, structured evidence of practice — usage, artifacts, learning insights — that traditional teacher studies can't access.

Low burden, high signal.

The weekly mechanism is usage-aware: questions are generated from what the teacher actually did, not a generic survey. Less recall bias, less fatigue, higher data quality.

Honest about limits.

We triangulate, state confounds plainly, and don't overclaim student-outcome causation. Credibility comes from rigour and candour, not big numbers.

One instrument, two outputs.

The competency trajectory the research builds for each Fellow is exactly what earns their recognition. Research and credential are the same measurement.

The measurement model

UNESCO's 5×3 competency matrix — the instrument.

Five dimensions × three progression levels = 15 competency blocks. Every Fellow has, in effect, a 15-cell scorecard. The research tracks movement across it over time.

The five dimensions

1
Human-centred mindset
Keeping human agency, judgement and responsibility central; AI as support, not substitute.
2
Ethics of AI
Privacy, data protection, bias, accountability, safe and responsible use.
3
AI foundations & applications
Understanding AI's mechanisms and limits; selecting and customising tools appropriately.
4
AI pedagogy
Integrating AI into lesson design, instruction and assessment to improve teaching and learning.
5
AI for professional learning
Using AI for one's own ongoing development and reflective practice.

The three progression levels

Acquire

Foundational — evaluate, select and use AI tools appropriately.

Deepen

Intermediate — design meaningful pedagogical strategies that integrate AI; critically assess its impact.

Create

Advanced — innovate teaching, design new approaches, lead and contribute to systemic change.

15 competency blocks. One scorecard per Fellow.

The same trajectory that earns Fellow recognition is the trajectory the research reports on. One instrument, two outputs.

Theory of change

The causal chain we're testing — link by link.

We don't just measure 'did competency go up.' We test the chain so we can say what works, why, and for whom.

INPUT
learnOS Teacher Edition (seeded to national curriculum) + AI coaching.
ACTIVITY
Teacher uses learnOS modules in real classrooms over 12–18 months.
MECHANISM
Specific kinds of use develop specific UNESCO competency blocks.
OUTPUT
Measurable progression across the 5×3 matrix (Acquire → Deepen → Create).
OUTCOME
Changed practice: lighter workload, higher confidence, better differentiation and assessment.
IMPACT
Improved teaching quality and — secondary, with caveats — student learning and engagement.

The interesting findings live in the MECHANISM row: which uses of which modules drive which competencies, for which teachers, in which contexts.

The measurement model

Three triangulated data streams.

We never rely on one source. Three streams, cross-checked against each other — a competency only scores as progressed when at least two of them agree.

Stream A

Behavioural / usage data

Objective · continuous

Captured from learnOS with consent: which modules a teacher uses, how often, and — crucially — how sophisticatedly. Depth of use is itself a competency signal.

Stream B

Reflective self-report

Weekly usage-aware check-in

Short, personalised, generated from that week's actual usage. Captures the why, the experienced impact, and the reasoning behind choices — the things behaviour alone can't show.

Stream C

Structured competency assessment

Baseline · mid · endline

Mapped to the 5×3 matrix, combining self-assessment with evidence-based items — scenario tasks and review of the teacher's own learnOS artifacts — to validate self-report against demonstrated capability.

How learnOS use maps to the framework

The heart of the design.

learnOS's real modules map cleanly onto the framework — and depth of use maps onto progression level.

UNESCO dimension

What evidences it in learnOS

Acquire → Deepen → Create signal

AI foundations & applications

Using the AI hub; selecting/configuring tools; understanding what learnOS can and can't do.

Acquire: uses defaults · Deepen: customises for purpose · Create: combines tools in novel ways.

AI pedagogy

OmniPrep (lesson design), AuraPractice (tiered practice & assessment), VisuTeach (resources), SparkClass (delivery & live insight).

Acquire: generates basic outputs · Deepen: designs differentiated, evidence-led pedagogy · Create: invents new instructional/assessment approaches.

Ethics of AI

The Pre-Grading assistant's ~70/30 human–machine split; mandatory teacher confirmation before any AI output is used ('AI drafts, teacher gatekeeps'); reviewing and correcting AI output; data-security awareness in DocuGuard.

Acquire: follows safe-use steps · Deepen: critically reviews & corrects AI · Create: shapes ethical rules of use.

Human-centred mindset

The teacher's 'final say' on every output; the Pre-Grading philosophy — AI handles the groundwork, the teacher owns the deeper judgement and the human encouragement only a person can give; supply-side tiering without labelling students.

Acquire: aware of agency · Deepen: exercises judgement over AI · Create: champions human-centred practice to peers.

AI for professional learning

The accumulation loop (building a personal resource base); the weekly reflections themselves; progression coaching.

Acquire: reflects on own use · Deepen: systematically improves practice · Create: mentors peers, contributes knowledge.

AI pedagogy is the dense centre.

Most of learnOS's teacher tools live here, so it carries the richest data — and it's the dimension ministries care most about.

A reflexivity we name honestly.

The coaching and weekly reflection are themselves professional-learning practice — the programme is both intervention and measurement for that dimension. We flag this openly in the analysis.

The core mechanism

The weekly usage-aware reflective check-in.

Each week, the agent reviews the Fellow's actual learnOS activity and generates a short, personalised set of reflective questions tied to what they actually did — not a generic survey. ~5–10 minutes, in their language, text or voice.

Worked example

A teacher used OmniPrep to build three lessons and AuraPractice once this week. The check-in might ask:

"You built three lessons with OmniPrep this week. Did you change much of what it produced — and why?"
→ AI pedagogy + Human-centred mindset
"You used AuraPractice once. Did the tiered practice change how your weaker students engaged?"
→ AI pedagogy (Deepen) + impact-on-learning signal
"Was there a moment you decided NOT to use the AI's suggestion?"
→ Ethics of AI + Human-centred mindset
"How much time did this save you, honestly — and where did it cost you time?"
→ Workload impact (balanced, non-leading)

Why this is strong methodologically

Reduces recall bias — asks about real, recent, specific actions.
Reduces survey fatigue — feels like a relevant conversation, not a form.
Adaptive — probes deeper as the teacher progresses; can target stalled cells.
Doubles as coaching — the reflection itself develops the professional-learning and human-centred dimensions (noted reflexively in analysis).
Neutral, non-leading framings ('honestly', 'where did it cost you') protect against demand characteristics.

A periodic deeper dive (e.g. monthly) runs a longer reflection or, for a sampled subset, a human/agent interview and a classroom-artifact review — feeding Stream C and the qualitative case library.

Assessment & recognition

From measurement to credential — without changing the instrument.

01
Baseline
Onboarding
Establish each Fellow's starting position on the 5×3 matrix (self-assessment + scenario items). The counterfactual anchor — each Fellow is partly their own control (pre/post).
02
Continuous
Weekly + monthly
Usage data and reflections build an evolving picture between formal assessments.
03
Mid-point & endline
Structured assessment
Evidence-based assessment confirms progression with demonstrated capability, not just self-report.
04
Progression scoring
Triangulated
A competency block is marked progressed only on triangulated evidence (Streams A, B and C). Each Fellow's trajectory across the 15 cells is their competency record.
05
Recognition loop
Same instrument
That competency record determines Fellow recognition. The teacher earns recognition by demonstrably progressing against the UNESCO framework — fair credential, clean research data.

Measuring impact

Three levels — with honesty about how confidently we can claim each.

Teacher-level

High confidence

Workload/time saved (self-reported + inferred), confidence and self-efficacy (validated short scales at baseline/mid/end), and change in practice — evidenced by learnOS artifacts over time.

Teaching-level

Medium confidence

Observable change in what teaching looks like: richer differentiation, more evidence-led assessment, higher-quality resources. The artifact trail in learnOS lets us assess change in practice directly, not only by self-report.

Learning-level

Secondary · exploratory · stated caveats

Student engagement and outcomes. Captured via teacher-reported engagement and, where schools consent, class-level learning insights as a proxy for mastery. Reported as indicative, not causal — attributing student outcomes at individual-teacher scale is not robustly possible without controlled conditions.

Over-claiming on student outcomes would discredit the whole study. Under-claiming honestly strengthens it. A future controlled sub-study at school-deployment scale is a natural Phase 2 research question.

Participant protocol

What we ask of teachers — designed to be light, honest, and fit around real teaching.

Participation is voluntary. Fellows may pause or withdraw at any time — exit is captured as data, never penalised.

01
Use learnOS Teacher Edition in real teaching at a target cadence (e.g. weekly).
02
Weekly reflective check-in — ~5–10 minutes, usage-aware, in their language.
03
Three formal assessments — baseline, mid-point, endline (~30–45 min each).
04
Consent to usage-data capture from learnOS (Stream A), with clear scope.
05
Optional / sampled extras — deeper interviews, classroom-artifact sharing, or written case studies (separate consent).
06
Total realistic burden: ~15–30 minutes/week beyond normal teaching, most of it the lightweight check-in.

The Living Research Platform

How the research runs itself — without marking its own homework.

An AI-led research engine sits behind the Fellowship. It decides what to ask each teacher, ingests their usage and answers, scores everything against the 15-cell matrix, and keeps a continuously-updating narrative current in real time. Humans own every truth claim that leaves the system.

Instrument Generation

Decides what to ask which teacher this week, generated from their actual learnOS activity and the framework cells most in need of evidence.

Ingestion

Pulls in Stream A usage, Stream B reflections and Stream C assessments — consented, secure, structured.

Coding & Mapping

Codes every input against the 5×3 matrix with inter-rater checks between AI coders and human reviewers.

Living Analysis

Re-runs progression, mechanism and segmentation analyses continuously as data arrives — not on a quarterly schedule.

Narrative Generation

Maintains a current internal narrative of what the evidence is saying, with provenance back to source data points.

Programme Steering

Flags stalled competencies, dormant Fellows and weak evidence cells back to the agent and the programme team.

Tier 1 — live intelligence

Internal, continuously updated.

A real-time view for the programme team and academic partner — used to steer the programme and spot where evidence is thin.

Tier 2 — published findings

Point-in-time, human-validated.

Nothing reaches the world without human review, pre-registered analysis plans and the partner's sign-off. The AI runs the research; humans own every truth claim.

Rigour architecture

Eight safeguards that stop an AI-run study becoming a self-confirming machine.

The defining risk of an AI that generates questions, codes answers and writes findings is that it quietly confirms whatever it already expects. The validation layer is not optional — it is the product.

Pre-registration

Hypotheses, analysis plans and coding rubrics are registered before data is seen.

Human validation gates

No finding is published without a human reviewer signing off the evidence and the framing.

Inter-rater reliability

AI coding is regularly checked against trained human coders on a sampled basis; drift is corrected.

Hold-outs & replication

Cohorts and time windows are held back to test whether findings replicate, not just generate.

Adversarial analysis

A devil's-advocate pass actively looks for the strongest alternative explanation before any claim is made.

Provenance & audit trail

Every claim traces back to the source data points, prompts and model versions that produced it.

Demand-characteristic controls

Question framings are neutralised against social desirability and leading prompts.

Transparency by default

Methods, limits and changes are published openly — the credibility comes from rigour and candour, not big numbers.

Study design, rigour & validity

Longitudinal, mixed-methods, cohort-based — with rolling/staggered cohorts.

Staggered cohorts act as a rough comparison for earlier ones — a stepped-wedge-style quasi-comparison, the closest we get to a counterfactual with volunteer teachers. Sampling is purposive and stratified across region, subject, school type, resource level and connectivity.

Triangulation rule

A competency is scored as progressed only when at least two streams agree — e.g. reported (B), evidenced in usage (A), and demonstrated in artifact or assessment (C).

Counterfactual honesty

No randomised control with volunteer teachers. We mitigate with pre/post within-Fellow change, baseline-as-control, staggered-cohort comparison, and benchmarking against the framework's expected developmental sequence.

Evidence-based assessment

Self-assessment is paired with scenario tasks and review of the teacher's own learnOS artifacts, so progression is validated against demonstrated capability — not opinion alone.

Stated limitations

Self-selection bias, reflexivity of measuring professional-learning while delivering it, consented self-report limits, and student-outcome attribution limits — all reported openly in every output.

Data, ethics & governance

Trust is not a feature. It's a precondition.

Informed consent

Provided in each participant's language, before any data is collected. Clear, withdrawable at any time.

Transparent data use

Clear statement of what is collected, how it's used, how long it's kept, and Fellows' rights over it. Attribution-choice for quotes and cases.

Independent oversight

Ethics oversight is a named role, not an afterthought — confirmed before the pilot cohort opens.

Secure, school-based handling

Data minimisation; secure, school-based handling; no identifiable student data beyond protocol and consent; research store separate from identity where possible.

Analysis

The full work-up.

The headline intellectual output is the mechanism analysis — the evidenced links between specific learnOS use and specific competency development, with the contextual factors that enable or block it.

Per-Fellow

A competency trajectory across all 15 cells over time; an impact profile (workload, confidence, practice change); a qualitative narrative.

Cohort / aggregate

Progression distributions per competency block; which modules and uses drive which competencies; time-to-progression; retention and its drivers.

Segmented

By region, subject, resource level, connectivity, baseline level — what works for whom, where, and under what conditions.

Mechanism analysis

The evidenced links between specific learnOS use and specific competency development, with enabling and blocking contextual factors.

Research outputs

Open. Cited. Available to the field.

All published openly, with methods and limitations, on a cadence that doubles as programme momentum.

01
Baseline report
Early in the programme
The starting state and the design — landscape view of where Fellows begin across regions.
02
Interim impact assessment
Mid-programme
First evidence of competency progression and emerging mechanisms — which uses of which modules are driving which competencies.
03
Flagship impact study & white paper
At conclusion
The full evidence base and findings — the largest real-world study of AI-competency development in under-resourced classrooms.
04
Regional landscape analyses
Throughout
Per-region cuts: what works for whom, where, and under what conditions.
05
Per-Fellow competency records
Continuous
Each Fellow's evidenced trajectory across the 15-cell matrix — the basis of their recognition.

Research partners

Academic and institutional co-authors and credibility partners will be named here as partnerships are confirmed. [e.g. IRCAI or named university — to be populated.]

Become a co-author of the world's evidence base.

Apply to become a Fellow and contribute to the largest real-world study of AI in teaching.

Apply now About the partnership