Case study response

Student Outcome Intelligence Platform

An Azure and Microsoft Fabric reference design that turns fragmented university data into explainable advisor action.

Case. A Norwegian university with 35,000 students across 8 faculties needs to flag risk of non-continuation early enough for advisors to intervene. The signals exist in SIS, LMS, ERP, and campus systems, but they rarely line up — different identifiers, different update rhythms, and current-state tables that quietly erase history. The answer below is a governed Azure data spine, point-in-time features, calibrated risk bands tied to advisor capacity, and an audited advisor queue.

Brief requirementHow the platform answers it
35,000 students ⋅ 8 faculties ⋅ 4 systemsAzure data spine with canonical identity, type-2 history, and Purview lineage.
Predict non-continuation earlyWeekly point-in-time scoring during the term, before grades reveal risk.
No leakageevent_time ≤ t and available_at ≤ t enforced on every feature row.
Capacity-aware queueτred and τamber set at quantiles tied to advisor headcount.
Explainable, fair, auditableReason codes per prediction, group fairness audit, and immutable access log.
Walkthrough

Five steps from raw signals to advisor action

  1. Acquire
    • SIS, LMS, ERP, campus signals
    • Data Factory + Event Hubs ingestion
    • ADLS Gen2 bronze landing
    Slides 3 – 6
  2. Govern
    • Canonical student identity
    • Type-2 mutable history
    • Purview lineage + contracts
    Slides 7 – 8
  3. Model
    • Calibrated risk score
    • Capacity-anchored bands
    • Fairness + validation gates
    Slides 9 – 11
  4. Deploy
    • Azure ML registry release
    • Power BI advisor queue
    • Row-level security + reason codes
    Slide 12
  5. Operate
    • Drift + freshness monitors
    • Immutable audit log
    • Roadmap + trade-off review
    Slides 13 – 15
Case summary

The brief is a platform design problem, not a model-only task

Scope diagram showing 35,000 students across 8 faculties, four independent source systems, and the early-warning window for advisors
Thesis

The hard part is trustworthy history, not the model

Industry expectation
Data~20% Model~80%
Production reality
Trustworthy data spine — identity, history, point-in-time features, lineage~80% Model~20%

Build the foundation first

  • Canonical student identity across all systems.
  • Type-2 history for mutable student status and programme data.
  • Data quality contracts, Purview lineage, and schema-drift detection.
  • Point-in-time features that only use data known at scoring time.

Then make risk useful

  • Weekly scoring during the semester.
  • Risk bands selected by advisor capacity and released with the Azure ML model version.
  • Concrete reason codes, not black-box flags.
  • Fairness monitoring and complete access audit.
If the foundation is wrong, every model retrained on top of it inherits the same blind spots — calibration, fairness, and explainability all collapse on bad history.
Assumptions

Scope the target so the model can be defended

Scope

Outcome

Predict non-continuation next term, excluding graduation, exchange completion, and approved leave.

Target
Term-end non-continuation
Excludes
Graduation ⋅ exchange ⋅ approved leave
Horizon
Next semester

RuleTerm-end attrition only — exclusions defined upfront.

Use

Decision

Advisor prioritization only. No automated punitive, academic, or financial decision.

Action
Advisor outreach & support
Never
Sanction ⋅ finance ⋅ grading
Override
Advisor can dismiss with reason

RuleHumans act on the queue — never the model directly.

Cadence

Timing

Score in weeks 4, 6, 8, and 10 so support can happen before grades reveal the issue.

Score weeks
4 ⋅ 6 ⋅ 8 ⋅ 10
Lead time
4–8 weeks before grades
Refresh
Weekly batch

RuleWeekly windows during term, ahead of grade events.

Trust

Privacy

DPIA, purpose limitation, data minimization, student transparency, and strict access control.

Basis
DPIA + purpose limitation
Access
Faculty advisors only, RLS-scoped
Retention
Programme rules + audit log

RuleEvery access logged — no silent reads.

Equity

Fairness

Use demographics for auditing and monitoring, not advisor-facing reason codes.

Audit cuts
Faculty ⋅ gender ⋅ age band
Use
Monitoring only, not features
Cadence
Every model release

RuleDemographics audit the model — they never rank a student.

Operations

Capacity

Optimize recall and precision at the number of cases advisors can actually work.

Anchor
Advisor headcount per faculty
Bands
τred ⋅ τamber at quantiles
Re-tune
Per term, with model release

Ruleτred and τamber follow advisor headcount.

Architecture

Azure lakehouse plus governed ML workflow

Student Outcome Intelligence Platform reference architecture diagram
Data model ⋅ ingestion

Bronze to Silver: raw payloads to cleansed tables

Bronze to Silver ingestion lineage: four source systems land raw events in Bronze Delta tables via Azure Data Factory and Event Hubs, then are cleansed and conformed into Silver tables using dbt staging models, Great Expectations contracts, and Type-2 SCD merges
Data model ⋅ features & scoring

Silver to Gold: features and scored predictions

Silver to Gold lineage: four Silver source tables join into a Gold point-in-time feature table via dbt marts, then scored by Azure ML and persisted to the advisor-facing risk_prediction table with MLflow tracking
Leakage control

Every row has an as-of date

usable(record, t)=1[event_timetavailable_attvalid_fromt<valid_to]
Point-in-time admission rule for source records
Formula spine

The math the platform has to defend

Risk

ri,t=P(Yi,t+h=1|Xi,t)

Probability of next-term non-continuation using only features available at scoring date t.

Capacity threshold

τred=quantile(r, 1Cred / N)
τamber=quantile(r, 1(Cred + Camber) / N)

Interpretable baseline

logit(ri,t)=β0+kβk·xk,i,t

xk ∈ {prior_gpa, missing_assignments, days_since_lms, payment_overdue_days, campus_drop, academic_context}

Fairness gap

gap=max(metricg)min(metricg)

Audit recall, FPR, calibration, and flag rate by group g.

Model & validation

Start interpretable, calibrated, and capacity-aware

Capacity-anchored thresholds on the score distribution

Validation metrics

Precision@C=TPtop C / C
Recall@C=TPtop C / Pactual
ECE=b(nb/N)·|mean(Yb)mean(rb)|
lead_timei=outcome_dateifirst_red_or_amber_datei

Confusion matrix at τred

PredictedNo-risk PredictedAt-risk ActualNo-risk TN78% FP5% ActualAt-risk FN8% TP9%

Threshold τred chosen so the at-risk count fits advisor capacity. Recall and lead time are reported alongside; precision is informative, not the optimization target.

Advisor experience

Make the prediction actionable

GREEN

STU-100913 | Economics

Positive triggers: LMS engagement +12% week-on-week | 0 missed assignments | balance current | prior GPA 3.6

No intervention recommended; continue normal monitoring.

Advisor: ADV-119
AMBER

STU-100087 | Nursing

Mixed signals: campus activity down 5 days | LMS views down 61% | prior GPA 2.24

Soft outreach: study-skills referral and check-in within 2 weeks.

Advisor: ADV-141
RED

STU-101442 | Computer Science

Primary triggers: 3 missed assignments | 9 days since LMS login | NOK 8,900 overdue balance

Priority outreach + finance referral; advisor reviews intervention status weekly.

Advisor: ADV-128
The advisor sees reasons, freshness, support options, and intervention history. The student is never reduced to a score alone.
Fairness

Engagement patterns differ for legitimate reasons

gap=maxg(metricg)ming(metricg), metric{recall, FPR, ECE, flag rate}
🎯 Recall Catch rate by group Share of actual non-continuers landing in the advisor queue. Low values for a group mean the queue under-serves them.
🚩 FPR False-flag rate by group Share of safe students flagged. High values mean a group bears advisor attention they didn’t need.
⚖️ ECE Calibration gap by group Distance between predicted and observed risk per bin. Drift here breaks the meaning of the score.
📊 Flag rate Queue share by group Fraction of a group routed red or amber. Compared to base rate to detect over- or under-flagging.
Deployment plan

Thirty-week rollout in five phases, with audit answered in phase one

Why

Score, band, reason codes, feature snapshot hash, feature values, model version, and threshold policy stored per prediction.

Who

Every advisor access logged with viewer, role, purpose, timestamp, fields returned, and student record.

Control

Row-level permissions, encrypted data, pseudonymous modeling, retention rules, and model release approval.

Thirty-week delivery plan with phase swimlanes and milestones
Trade-offs

Key decisions I would defend

Cadence

Start daily batch with event capture

Advisor intervention does not require millisecond latency. Data Factory batch plus Event Hubs capture preserves detail without overpromising real-time action.

AcceptedHours of staleness in exchange for governable lineage.

Exposure

Store raw and expose aggregates

Raw data supports audit and backfill, while aggregated features reduce privacy exposure in the advisor workflow.

AcceptedTwo storage tiers and a feature contract instead of one flat surface.

Modeling

Prefer explainability over small accuracy gains

An unsupported black-box score is less valuable than a slightly weaker model advisors and regulators can understand.

AcceptedA few AUC points to keep reason codes honest.

Ethics

Use protected attributes for audit

You cannot prove fairness while refusing to measure outcomes across groups.

AcceptedStoring demographics under audit, never as advisor reason codes.