Case study response

Student Outcome Intelligence Platform

An Azure and Microsoft Fabric reference design that turns fragmented university data into explainable advisor action.

Case. A Norwegian university with 35,000 students across 8 faculties needs to flag risk of non-continuation early enough for advisors to intervene. The signals exist in SIS, LMS, ERP, and campus systems, but they rarely line up — different identifiers, different update rhythms, and current-state tables that quietly erase history. The answer below is a governed Azure data spine, point-in-time features, calibrated risk bands tied to advisor capacity, and an audited advisor queue.

Brief requirement	How the platform answers it
35,000 students ⋅ 8 faculties ⋅ 4 systems	Azure data spine with canonical identity, type-2 history, and Purview lineage.
Predict non-continuation early	Weekly point-in-time scoring during the term, before grades reveal risk.
No leakage	event_time ≤ t and available_at ≤ t enforced on every feature row.
Capacity-aware queue	τ_red and τ_amber set at quantiles tied to advisor headcount.
Explainable, fair, auditable	Reason codes per prediction, group fairness audit, and immutable access log.

Walkthrough

Five steps from raw signals to advisor action

Acquire
- SIS, LMS, ERP, campus signals
- Data Factory + Event Hubs ingestion
- ADLS Gen2 bronze landing
Slides 3 – 6
Govern
- Canonical student identity
- Type-2 mutable history
- Purview lineage + contracts
Slides 7 – 8
Model
- Calibrated risk score
- Capacity-anchored bands
- Fairness + validation gates
Slides 9 – 11
Deploy
- Azure ML registry release
- Power BI advisor queue
- Row-level security + reason codes
Slide 12
Operate
- Drift + freshness monitors
- Immutable audit log
- Roadmap + trade-off review
Slides 13 – 15

Case summary

The brief is a platform design problem, not a model-only task

Scope diagram showing 35,000 students across 8 faculties, four independent source systems, and the early-warning window for advisors

Thesis

The hard part is trustworthy history, not the model

Industry expectation

Data~20% Model~80%

Production reality

Trustworthy data spine — identity, history, point-in-time features, lineage~80% Model~20%

Build the foundation first

Canonical student identity across all systems.
Type-2 history for mutable student status and programme data.
Data quality contracts, Purview lineage, and schema-drift detection.
Point-in-time features that only use data known at scoring time.

Then make risk useful

Weekly scoring during the semester.
Risk bands selected by advisor capacity and released with the Azure ML model version.
Concrete reason codes, not black-box flags.
Fairness monitoring and complete access audit.

If the foundation is wrong, every model retrained on top of it inherits the same blind spots — calibration, fairness, and explainability all collapse on bad history.

Assumptions

Scope the target so the model can be defended

Scope

Outcome

Predict non-continuation next term, excluding graduation, exchange completion, and approved leave.

Target: Term-end non-continuation
Excludes: Graduation ⋅ exchange ⋅ approved leave
Horizon: Next semester

RuleTerm-end attrition only — exclusions defined upfront.

Use

Decision

Advisor prioritization only. No automated punitive, academic, or financial decision.

Action: Advisor outreach & support
Never: Sanction ⋅ finance ⋅ grading
Override: Advisor can dismiss with reason

RuleHumans act on the queue — never the model directly.

Cadence

Timing

Score in weeks 4, 6, 8, and 10 so support can happen before grades reveal the issue.

Score weeks: 4 ⋅ 6 ⋅ 8 ⋅ 10
Lead time: 4–8 weeks before grades
Refresh: Weekly batch

RuleWeekly windows during term, ahead of grade events.

Trust

Privacy

DPIA, purpose limitation, data minimization, student transparency, and strict access control.

Basis: DPIA + purpose limitation
Access: Faculty advisors only, RLS-scoped
Retention: Programme rules + audit log

RuleEvery access logged — no silent reads.

Equity

Fairness

Use demographics for auditing and monitoring, not advisor-facing reason codes.

Audit cuts: Faculty ⋅ gender ⋅ age band
Use: Monitoring only, not features
Cadence: Every model release

RuleDemographics audit the model — they never rank a student.

Operations

Capacity

Optimize recall and precision at the number of cases advisors can actually work.

Anchor: Advisor headcount per faculty
Bands: τ_red ⋅ τ_amber at quantiles
Re-tune: Per term, with model release

Ruleτ_red and τ_amber follow advisor headcount.

Architecture

Azure lakehouse plus governed ML workflow

Data model ⋅ ingestion

Bronze to Silver: raw payloads to cleansed tables

Bronze to Silver ingestion lineage: four source systems land raw events in Bronze Delta tables via Azure Data Factory and Event Hubs, then are cleansed and conformed into Silver tables using dbt staging models, Great Expectations contracts, and Type-2 SCD merges

Data model ⋅ features & scoring

Silver to Gold: features and scored predictions

Silver to Gold lineage: four Silver source tables join into a Gold point-in-time feature table via dbt marts, then scored by Azure ML and persisted to the advisor-facing risk_prediction table with MLflow tracking

Leakage control

Every row has an as-of date

usable(record, t)=1[event_time≤t∧available_at≤t∧valid_from≤t<valid_to]

Point-in-time admission rule for source records

Formula spine

The math the platform has to defend

Risk

r_i,t=P(Y_i,t+h=1|X_i,t)

Probability of next-term non-continuation using only features available at scoring date t.

Capacity threshold

τ_red=quantile(r, 1−C_red / N)

τ_amber=quantile(r, 1−(C_red + C_amber) / N)

Interpretable baseline

logit(r_i,t)=β₀+∑_kβ_k·x_k,i,t

x_k ∈ {prior_gpa, missing_assignments, days_since_lms, payment_overdue_days, campus_drop, academic_context}

Fairness gap

gap=max(metric_g)−min(metric_g)

Audit recall, FPR, calibration, and flag rate by group g.

Model & validation

Start interpretable, calibrated, and capacity-aware

Capacity-anchored thresholds on the score distribution

Validation metrics

Precision@C=TP_top C / C

Recall@C=TP_top C / P_actual

ECE=∑_b(n_b/N)·|mean(Y_b)−mean(r_b)|

lead_time_i=outcome_date_i−first_red_or_amber_date_i

Confusion matrix at τ_red

PredictedNo-risk PredictedAt-risk ActualNo-risk TN78% FP5% ActualAt-risk FN8% TP9%

Threshold τ_red chosen so the at-risk count fits advisor capacity. Recall and lead time are reported alongside; precision is informative, not the optimization target.

Advisor experience

Make the prediction actionable

GREEN

STU-100913 | Economics

Positive triggers: LMS engagement +12% week-on-week | 0 missed assignments | balance current | prior GPA 3.6

No intervention recommended; continue normal monitoring.

Advisor: ADV-119

AMBER

STU-100087 | Nursing

Mixed signals: campus activity down 5 days | LMS views down 61% | prior GPA 2.24

Soft outreach: study-skills referral and check-in within 2 weeks.

Advisor: ADV-141

RED

STU-101442 | Computer Science

Primary triggers: 3 missed assignments | 9 days since LMS login | NOK 8,900 overdue balance

Priority outreach + finance referral; advisor reviews intervention status weekly.

Advisor: ADV-128

The advisor sees reasons, freshness, support options, and intervention history. The student is never reduced to a score alone.

Fairness

Engagement patterns differ for legitimate reasons

gap=max_g(metric_g)−min_g(metric_g), metric∈{recall, FPR, ECE, flag rate}

🎯 Recall Catch rate by group Share of actual non-continuers landing in the advisor queue. Low values for a group mean the queue under-serves them.

🚩 FPR False-flag rate by group Share of safe students flagged. High values mean a group bears advisor attention they didn’t need.

⚖️ ECE Calibration gap by group Distance between predicted and observed risk per bin. Drift here breaks the meaning of the score.

📊 Flag rate Queue share by group Fraction of a group routed red or amber. Compared to base rate to detect over- or under-flagging.

Deployment plan

Thirty-week rollout in five phases, with audit answered in phase one

Why

Score, band, reason codes, feature snapshot hash, feature values, model version, and threshold policy stored per prediction.

Who

Every advisor access logged with viewer, role, purpose, timestamp, fields returned, and student record.

Control

Row-level permissions, encrypted data, pseudonymous modeling, retention rules, and model release approval.

Thirty-week delivery plan with phase swimlanes and milestones

Trade-offs

Key decisions I would defend

Cadence

Start daily batch with event capture

Advisor intervention does not require millisecond latency. Data Factory batch plus Event Hubs capture preserves detail without overpromising real-time action.

AcceptedHours of staleness in exchange for governable lineage.

Exposure

Store raw and expose aggregates

Raw data supports audit and backfill, while aggregated features reduce privacy exposure in the advisor workflow.

AcceptedTwo storage tiers and a feature contract instead of one flat surface.

Modeling

Prefer explainability over small accuracy gains

An unsupported black-box score is less valuable than a slightly weaker model advisors and regulators can understand.

AcceptedA few AUC points to keep reason codes honest.

Ethics

Use protected attributes for audit

You cannot prove fairness while refusing to measure outcomes across groups.

AcceptedStoring demographics under audit, never as advisor reason codes.

Student Outcome Intelligence Platform

Five steps from raw signals to advisor action

The brief is a platform design problem, not a model-only task

The hard part is trustworthy history, not the model

Build the foundation first

Then make risk useful

Scope the target so the model can be defended

Outcome

Decision

Timing

Privacy

Fairness

Capacity

Azure lakehouse plus governed ML workflow

Bronze to Silver: raw payloads to cleansed tables

Silver to Gold: features and scored predictions

Every row has an as-of date

The math the platform has to defend

Risk

Capacity threshold

Interpretable baseline

Fairness gap

Start interpretable, calibrated, and capacity-aware

Validation metrics

Confusion matrix at τred

Make the prediction actionable

STU-100913 | Economics

STU-100087 | Nursing

STU-101442 | Computer Science

Engagement patterns differ for legitimate reasons

Thirty-week rollout in five phases, with audit answered in phase one

Why

Who

Control

Key decisions I would defend

Start daily batch with event capture

Store raw and expose aggregates

Prefer explainability over small accuracy gains

Use protected attributes for audit

Confusion matrix at τ_red