Case study response

Student Outcome Intelligence Platform

An Azure and Microsoft Fabric reference design that turns fragmented university data into explainable advisor action.

Foundation Unify student history

SIS, LMS, ERP, and campus activity resolved in ADLS Gen2, Fabric OneLake, and governed history tables.

Intelligence Score risk early

Fabric feature products, Azure ML model release, calibrated risk bands, and transparent reason codes.

Action Guide advisor outreach

A Power BI/Fabric advisor queue with row-level security, intervention context, freshness, and audit evidence.

Case summary

The brief is a platform design problem, not a model-only task

University context

  • 35,000 students across 8 faculties.
  • Different programmes, study modes, and engagement patterns.
  • Advisors need early signals before official records show risk.

Data and governance challenge

  • SIS, LMS, ERP, and campus systems do not share identifiers or update rhythms.
  • Student status changes over time, so current-state tables create bad history.
  • The platform must be accurate, explainable, fair, auditable, and durable after launch.
Thesis

The hard part is trustworthy history, not the model

Build the foundation first

  • Canonical student identity across all systems.
  • Type-2 history for mutable student status and programme data.
  • Data quality contracts, Purview lineage, and schema-drift detection.
  • Point-in-time features that only use data known at scoring time.

Then make risk useful

  • Weekly scoring during the semester.
  • Risk bands selected by advisor capacity and released with the Azure ML model version.
  • Concrete reason codes, not black-box flags.
  • Fairness monitoring and complete access audit.
Assumptions

Scope the target so the model can be defended

Outcome

Predict non-continuation next term, excluding graduation, exchange completion, and approved leave.

Decision

Advisor prioritization only. No automated punitive, academic, or financial decision.

Timing

Score in weeks 4, 6, 8, and 10 so support can happen before grades reveal the issue.

Privacy

DPIA, purpose limitation, data minimization, student transparency, and strict access control.

Fairness

Use demographics for auditing and monitoring, not advisor-facing reason codes.

Capacity

Optimize recall and precision at the number of cases advisors can actually work.

Source systems

Each feed needs a contract, not just an extract

SystemSignalIntegration decisionMain control
SISStatus, programme, faculty, grades, leave, withdrawal, graduationData Factory/Fabric extract into ADLS Gen2 bronzeType-2 history; never train from current-state status alone
LMSLogins, course views, submissions, forums, video, quizzesEvent Hubs or incremental pulls normalized to daily student-course factsNormalize by course, week, programme, and study mode
ERPFinancial aid, payments, scholarships, balances, overdue daysDaily vendor snapshot with Key Vault-backed credentialEffective-dated finance snapshots and source freshness checks
CampusLibrary, WiFi, building entryEvent Hubs Capture or aggregated privacy-preserving daily extractDo not treat low campus presence as uniformly risky
Every record carries source_system, source_record_id, event_time, available_at, ingested_at, pipeline_run_id, schema version, and Purview asset reference.
Architecture

Azure lakehouse plus governed ML workflow

Student Outcome Intelligence Platform reference architecture diagram
Azure service map

Service choices are tied to the case risks

Data Factory + Event Hubs

One ingestion control plane for files, APIs, database extracts, and high-frequency activity streams.

ADLS Gen2 + OneLake

Raw replayable evidence, medallion lakehouse layers, Delta tables, and SQL access for feature products.

Microsoft Purview

Catalog, lineage, sensitivity labels, retention evidence, and data-product ownership.

Azure Machine Learning

Training pipelines, registry, model cards, calibration checks, Responsible AI review, and endpoint release.

Power BI / Fabric app

Advisor queue, risk bands, score trends, reason codes, and Entra-backed row-level security.

Monitor + Key Vault

Pipeline SLA alerts, schema failures, score drift, endpoint telemetry, managed identities, and secrets.

Reference basis

The Azure choices map to Microsoft platform patterns

Fabric + OneLake

One tenant-wide analytical lakehouse foundation for university data products.

Medallion layers

Bronze raw evidence, silver trusted history, and gold feature/advisor products.

Event Hubs Capture

High-frequency LMS and campus events retained in storage for replay and batch scoring.

Purview lineage

Pipeline runs and data products remain traceable for student or regulator questions.

Responsible AI

Azure ML review artifacts sit beside the registered model and threshold release.

Power BI RLS

Advisor queues are scoped through model-level row filters and Entra groups.

Data model

Columns that make the solution work

TableExample columnsPurpose
identity_mapcanonical_student_id, source_system, source_person_id, valid_from, valid_to, match_confidenceJoins independent systems without trusting inconsistent IDs.
ingestion_auditpipeline_run_id, source_uri, schema_version, record_count, watermark, purview_asset_idLinks features and predictions back to Azure pipeline evidence.
student_status_historystatus, faculty, programme, valid_from, valid_to, change_recorded_atStores historical truth when students switch, leave, withdraw, or return.
lms_activity_dailylogin_count, course_views, submissions, late_submission_count, video_minutesCaptures digital engagement and assessment progress.
finance_snapshotstuition_paid, outstanding_balance_nok, payment_overdue_days, aid_statusSignals financial friction while preserving source timestamp.
feature_student_weekas_of_date, lms_logins_14d, missing_assignments_to_date, campus_days_14d, payment_overdue_daysModel input generated as a point-in-time snapshot.
risk_predictionmodel_version, risk_score, risk_band, top_reasons, feature_snapshot_hashReproducible advisor-facing output.
Leakage control

Every row has an as-of date

Feature rule

Use a source record only when: event_time <= as_of_date available_at <= as_of_date valid_from <= as_of_date valid_to > as_of_date

Why it matters

  • Late-arriving records cannot sneak into historical training.
  • Future withdrawals cannot leak through current SIS status.
  • Programme changes are joined as they were known then.
  • Predictions can be reproduced for a student or regulator.
Model

Start interpretable, calibrated, and capacity-aware

Baseline model

  • Calibrated logistic regression or explainable gradient boosted trees.
  • Train on historical term snapshots from the same weeks used in production.
  • Exclude protected attributes from default model inputs.
  • Version model, features, thresholds, and training data in the Azure ML release package.

Evaluation

  • Recall and precision at advisor capacity.
  • Calibration by risk band and by faculty.
  • Responsible AI cohort review before each release.
  • Lead time before official dropout signal.
  • Stability and drift across terms.
Advisor experience

Make the prediction actionable

RED

STU-101442 | Computer Science

Reasons: 3 missed assignments | 9 days since LMS login | NOK 8,900 overdue balance

Advisor: ADV-128
AMBER

STU-100087 | Nursing

Reasons: campus activity down 5 days | prior GPA 2.24 | LMS views down 61 percent

Advisor: ADV-141
GREEN

STU-100913 | Economics

No current intervention recommended; continue normal monitoring.

Advisor: ADV-119
The advisor sees reasons, freshness, support options, and intervention history. The student is never reduced to a score alone.
Fairness

Engagement patterns differ for legitimate reasons

Design choices

  • Compare engagement within programme, study mode, and term week.
  • Do not treat low campus presence as uniformly risky.
  • Keep sensitive attributes out of reason codes.
  • Review advisor feedback for bias, not just model scores.

Monitoring

  • Flag rate, recall, false-positive rate, and calibration by group.
  • Reason-code distribution by group.
  • Intervention offer and acceptance rates.
  • Term-by-term drift in population and outcomes.
Governance

Answer "why was I flagged and who saw it?"

Why

Store score, band, reason codes, feature snapshot hash, feature values, model version, and threshold policy.

Who

Audit every advisor access with viewer, role, purpose, timestamp, fields returned, and student record.

Control

Row-level permissions, encrypted data, pseudonymous modeling, retention rules, and model release approval.

DPIAPurview lineageAzure ML model cardFeature registrySchema testsRollback plan
Brief coverage

How the answer maps to the prompt

Data foundation

  • Four independent systems: source-specific contracts through Data Factory/Fabric and Event Hubs.
  • Inconsistent identifiers: canonical identity map with match confidence.
  • Changing student status: type-2 history and effective-dated joins.
  • Schema changes: quarantine, contract tests, profiling, and promotion gates.

Decision safety

  • Early warning: weekly in-semester scoring before official dropout records.
  • No leakage: event_time and available_at must both be before scoring date.
  • Fairness: calibration, FPR, recall, flag-rate, and reason-code checks by group.
  • Auditability: explanation, model version, threshold policy, and access log.
Roadmap

Pilot first, then scale

PhaseWorkExit criteria
0-4 weeksDPIA, outcome definition, Azure landing zone, access model, data contractsApproved scope and governance
4-10 weeksADLS/Fabric setup, ingest SIS/LMS/ERP/campus, identity map, history tablesTrusted integrated data layer
10-16 weeksFabric point-in-time features, labels, validation, first Azure ML modelAuditable offline model
16-22 weeksPower BI/Fabric advisor pilot in two faculties, RLS, feedback loopUseful interventions and safe workflow
22-30 weeksResponsible AI review, Azure Monitor, Purview lineage, model registry, hardeningScale decision
Trade-offs

Key decisions I would defend

Start daily batch with event capture

Advisor intervention does not require millisecond latency. Data Factory batch plus Event Hubs capture preserves detail without overpromising real-time action.

Store raw and expose aggregates

Raw data supports audit and backfill, while aggregated features reduce privacy exposure in the advisor workflow.

Prefer explainability over small accuracy gains

An unsupported black-box score is less valuable than a slightly weaker model advisors and regulators can understand.

Use protected attributes for audit

You cannot prove fairness while refusing to measure outcomes across groups.

Demo artifacts

Dummy data and prototype outputs

Run locally

python .\student_outcome_platform_demo.py --out outputs --students 1200 --seed 42

Generated files

  • source_samples/*.csv
  • feature_student_week_and_predictions.csv
  • advisor_risk_list.csv
  • fairness_report.csv
  • access_audit_sample.csv
  • risk_distribution.png