Student Outcome Intelligence Platform

Concise case-study answer: build an Azure-based student-risk platform, not just a dropout model. The core deliverable is a point-in-time, auditable advisor queue that turns SIS, LMS, ERP, and campus signals into explainable support actions.

Case Summary

The case is a 20-30 minute design presentation for a large Norwegian university with 35,000 students across 8 faculties. The university wants to identify students at risk of dropping out early enough for academic advisors to intervene, before the issue appears in official grade, withdrawal, or status records.

The hard problem is that the signals are split across four systems that were built independently:

Student Information System: identity, enrolment, programme, faculty, grades, graduation, leave, withdrawal, and status history.
Learning Management System: logins, course access, submissions, forums, video watch time, quizzes, and other high-frequency learning behavior.
ERP: financial aid, tuition payment status, scholarships, balances, and overdue-payment signals.
Physical campus systems: library access, WiFi presence, and building-entry activity.

The platform must handle inconsistent identifiers, changing student status, late-arriving records, schema changes, diverse engagement patterns, personal data, explainability, fairness, and auditability. The proposed solution is an Azure and Microsoft Fabric data spine that makes these records trustworthy, builds point-in-time features, scores dropout risk, and delivers an advisor queue with reason codes and access audit.

Executive Answer

Create a Microsoft Azure and Fabric platform that resolves student identity, stores mutable SIS records as history, builds leakage-safe feature snapshots, predicts non-continuation risk during the term, and gives advisors risk bands plus reason codes.

Use Azure Data Factory or Fabric Data Factory for scheduled extracts, Azure Event Hubs for high-frequency LMS and campus events, ADLS Gen2 and OneLake for the medallion lakehouse, Microsoft Purview for catalog and lineage, Azure Machine Learning for model lifecycle, and Power BI or a Fabric app for advisor delivery.

Launch as daily ingestion with weekly scoring. Use streaming only where faster data changes an advisor decision; otherwise capture events into the lake and govern them like any other source.

Assumptions And Scope

Area	Assumption	Design consequence
University size	35,000 students across 8 faculties, with mixed full-time, part-time, international, and remote/hybrid study patterns.	Segment validation by faculty, programme, study mode, and term week instead of using one unqualified engagement baseline.
Decision use	The score supports advisor prioritization only.	No automated academic, disciplinary, or financial action is allowed from the score.
Outcome	Risk means next eligible term non-continuation, excluding graduation, exchange completion, approved leave, and administrative data corrections.	Labels need an exclusion table and a closed observation window before training.
Timing	Advisors need enough lead time to intervene before final grades or official withdrawal records.	Score in-semester on a weekly cadence after early engagement signals exist.
Data law and ethics	The platform handles personal data and may be challenged by students or regulators.	Build DPIA, purpose limitation, minimization, lineage, access audit, model cards, and appeal/explanation support from the start.
Cloud target	The university can deploy to an Azure tenant with Norwegian/EU residency controls and Microsoft 365 identity already in use.	Prefer Microsoft Fabric, ADLS Gen2, Azure ML, Purview, Entra ID, Key Vault, Monitor, and Power BI rather than a cloud-neutral stack.
Preparation time	The interview asks for a 20-30 minute presentation, not a full implementation plan.	Focus on defensible architecture choices, trade-offs, formulas, controls, and pilot path.

Formula Sheet

Target

Y_i,t+h = 1

Student i does not continue in the next eligible term, excluding graduation, approved leave, and exchange completion.

Risk Score

r_i,t = P(Y_i,t+h = 1 | X_i,t)

Risk is the probability of next-term non-continuation using only features available at scoring date t.

Interpretable Baseline

logit(r_i,t) = β₀

+β₁ prior_gpa_i,t

+β₂ missing_assignments_i,t

+β₃ days_since_lms_i,t

+β₄ payment_overdue_days_i,t

+β₅ campus_drop_i,t

+β_6..k academic_context_i,t

Advisor Capacity Threshold

τ_red = quantile(r_i,t, 1 - C_red / N)

τ_amber = quantile(r_i,t, 1 - (C_red + C_amber) / N)

Thresholds are tied to real advisor capacity rather than a detached score cutoff.

Risk Band

Redr_i,t ≥ τ_red

Amberτ_amber ≤ r_i,t < τ_red

Greenotherwise

Reference Architecture

Reference architecture diagram

Layer	Azure decision	Purpose	Output
Sources	SIS, LMS, ERP, campus systems	Source-owned files, APIs, database extracts, and events	Raw operational evidence
Ingestion	Azure Data Factory / Fabric Data Factory, Azure Event Hubs, managed identities	Scheduled extraction, event capture, schema checks, freshness logging	Landing records with source metadata
Raw lake	ADLS Gen2, Event Hubs Capture, immutable folders	Preserve original records exactly as received	Replayable bronze evidence
Curated lakehouse	Microsoft Fabric OneLake Lakehouses with Delta tables	Identity resolution, type-2 history, quality contracts, deduplication	Trusted silver tables
Feature products	Fabric SQL analytics endpoint, notebooks, data warehouse where useful	Student-term facts, point-in-time weekly features, closed outcome labels	Gold training and scoring tables
ML lifecycle	Azure Machine Learning registry, pipelines, Responsible AI dashboard	Training, validation, calibration, fairness analysis, model cards, threshold release	Versioned model and threshold package
Advisor delivery	Power BI / Fabric app, Teams notification, row-level security	Risk bands, reason codes, score trend, intervention workflow	Audited support queue
Control plane	Microsoft Purview, Entra ID, Key Vault, Azure Monitor, Log Analytics, Azure Policy	Catalog, lineage, access, secrets, alerts, drift, deployment governance	Evidence for operations and audit

Azure Service Blueprint

Need from the brief	Azure service choice	Why it fits
Four systems with different files, APIs, events, and update frequencies	Azure Data Factory / Fabric Data Factory plus Azure Event Hubs	Handles scheduled copy, metadata activities, API/file extraction, and high-frequency streams without forcing one ingestion pattern.
Raw evidence that can be replayed	ADLS Gen2 with hierarchical namespace and lifecycle policies	Provides scalable, low-cost lake storage with directory semantics, file-level security, and retention tiers.
Governed analytics layer	Microsoft Fabric OneLake Lakehouses using bronze, silver, and gold layers	Keeps lakehouse layers separate, uses Delta tables for curated data, and exposes SQL/Power BI access for analytics.
Lineage and data discovery	Microsoft Purview Data Map and Unified Catalog	Captures metadata across disparate sources and gives data stewards a searchable governance layer.
Model training, release, and fairness review	Azure Machine Learning with Responsible AI dashboard	Supports model versioning and gives reviewers error analysis, fairness metrics, feature importance, and cohort analysis.
Advisor-facing restricted view	Power BI/Fabric app with Entra groups and row-level security	Lets advisors see only assigned students or faculties while preserving semantic-model governance.
Secrets, encryption, and access control	Microsoft Entra ID, managed identities, Key Vault, private endpoints where required	Reduces password handling and centralizes role-based access, credentials, and keys.
Production monitoring	Azure Monitor, Log Analytics, Application Insights, alert rules	Tracks pipeline health, source freshness, schema failures, endpoint latency, score drift, and access events.

Microsoft Reference Basis

Design point	Microsoft reference used	How it informs this answer
Unified analytics foundation	Microsoft Fabric overview	Supports the OneLake/Fabric lakehouse choice for a university-wide data spine.
Bronze, silver, and gold data layers	Medallion lakehouse architecture	Supports preserving raw records, promoting validated history, and publishing gold feature products.
Stream capture for LMS and campus signals	Azure Event Hubs Capture	Supports capturing high-frequency events into Blob Storage or ADLS for long-term retention and batch processing.
Pipeline lineage and auditability	Connect Azure Data Factory to Microsoft Purview	Supports linking ingestion and transformation evidence into Purview lineage.
Responsible model review	Azure ML Responsible AI dashboard	Supports attaching interpretability, error analysis, and review artifacts to registered models.
Advisor-scoped access	Power BI row-level security	Supports Entra-backed row-level restrictions in the advisor queue.

Source System Integration

Source	Typical content	Ingestion pattern	Key risk	Control
SIS	Enrolment status, programme, faculty, grades, personal details, graduation, withdrawal, leave history.	Azure Data Factory connector, database extract, or secure file landing into ADLS Gen2 bronze.	Current-state fields overwrite historical truth.	Type-2 status and programme history with `valid_from`, `valid_to`, and `recorded_at`.
LMS	Logins, course access, assignment submissions, forum activity, video watch time, quiz attempts.	Event Hubs for activity streams where available; otherwise Fabric/Data Factory incremental pulls normalized to daily facts.	Noisy behavior differs by course design and study mode.	Course/week normalization, source freshness checks, and missingness indicators.
ERP	Financial aid, tuition payment status, scholarships, balances, overdue days.	Daily vendor snapshot through Data Factory with managed identity or Key Vault-backed credential.	Different identifiers and delayed payment updates.	Identity map, effective-dated snapshots, and `available_at` timestamps.
Campus	Library access, WiFi presence, building entry events.	Event Hubs Capture to ADLS Gen2, or daily aggregated privacy-preserving file extract.	Physical presence is not equally meaningful for remote, commuter, placement, or part-time students.	Aggregate to low-granularity engagement features and calibrate by study mode/programme.

Each feed lands with source_system, source_record_id, event_time, available_at, ingested_at, schema version, source URI, pipeline run ID, and Purview asset reference. Breaking schema changes are quarantined; non-breaking changes are profiled before promotion to trusted tables.

Data Model

Table	Key columns	Role
`identity_map`	`canonical_student_id`, `source_system`, `source_person_id`, `valid_from`, `valid_to`, `match_confidence`	Joins fragmented systems safely.
`student_status_history`	`status`, `faculty`, `programme`, `valid_from`, `valid_to`, `recorded_at`	Prevents current-state leakage.
`ingestion_audit`	`pipeline_run_id`, `source_system`, `source_uri`, `schema_version`, `record_count`, `watermark`, `purview_asset_id`	Links data products back to Azure pipeline evidence.
`fact_enrollment_term`	`term_id`, `credits_registered`, `prior_gpa`, `academic_standing`	Academic baseline.
`fact_lms_activity_daily`	`activity_date`, `login_count`, `course_views`, `assignment_due`, `assignment_submitted`	Digital engagement.
`fact_financial_snapshot`	`status_date`, `outstanding_balance_nok`, `payment_overdue_days`, `aid_status`	Financial friction.
`fact_campus_activity_daily`	`event_date`, `building_entry_count`, `library_entry_count`, `wifi_minutes`	Aggregated physical engagement.
`feature_student_week`	`as_of_date`, `term_week`, feature columns, `feature_snapshot_hash`	Model input.
`risk_prediction`	`prediction_id`, `model_version`, `risk_score`, `risk_band`, `top_reasons`	Advisor output.
`access_audit`	`viewer_user_id`, `purpose`, `timestamp`, `prediction_id`, `fields_returned`	Accountability.

Point-In-Time Features

Every row is built as of t = as_of_date.

Leakage Rule

usable(record, t) = 1[event_time ≤ t and available_at ≤ t]

14-Day LMS Activity

lms_logins_14d(i,t) = ∑ login_count_i,d for d in [t - 13, t]

Days Since LMS

days_since_lms(i,t) = t - max(activity_date_i where login_count_i,d > 0)

Assessment Gap

missing_assignments(i,t) = max(due_to_date_i,t - submitted_to_date_i,t, 0)

Submission Rate

submission_rate(i,t) = submitted_to_date_i,t / max(due_to_date_i,t, 1)

Campus Drop

campus_drop(i,t) = campus_days_first_14d_i - campus_days_14d_i,t

Financial Signal

overdue_flag(i,t) = 1[payment_overdue_days_i,t > 0]

Keep labels separate until the outcome window closes. Store each feature snapshot so predictions can be reproduced exactly.

Model And Thresholding

Start in Azure Machine Learning with calibrated logistic regression or explainable gradient boosted trees. Optimize for useful advisor prioritization, not only aggregate accuracy.

Precision At Capacity

Precision@C = true_positives_in_top_C / C

Recall At Capacity

Recall@C = true_positives_in_top_C / all_actual_positives

Lead Time

lead_time_i = outcome_date_i - first_red_or_amber_score_date_i

Calibration Bin Error

cal_error_b = | mean(Y_i in bin b) - mean(r_i,t in bin b) |

Expected Calibration Error

ECE = ∑_b (n_b / N) cal_error_b

Use thresholds tied to advisor capacity, for example top 5 percent red and next 15 percent amber. Register each model and threshold package together in Azure ML, then review thresholds each term before release.

Fairness And Calibration

Protected attributes are retained for audit, not shown as advisor reason codes.

Flag Rate By Group

flag_rate_g = count(r_i,t ≥ τ_amber and group_i = g) / count(group_i = g)

False Positive Rate

FPR_g = FP_g / (FP_g + TN_g)

False Negative Rate

FNR_g = FN_g / (FN_g + TP_g)

Recall Parity Gap

gap_recall = max_g(recall_g) - min_g(recall_g)

Calibration Gap

gap_calibration = max_g(ECE_g) - min_g(ECE_g)

Audit by gender, age band, international status, first-generation status, faculty, programme, and study mode. Use Azure ML Responsible AI outputs for cohort analysis and model explanations, then store the signed-off fairness report beside the model card. Also monitor reason-code distributions to catch proxy discrimination.

Advisor Workflow

The advisor view should show only decision-support evidence:

Student, programme, risk band, score range, and score trend.
Top three actionable reasons with source freshness.
Recommended support route: academic check-in, financial-aid referral, study-skills support, or wellbeing referral.
Contact status, notes, and intervention outcome.
"Not relevant" feedback, captured for monitoring.

Do not automate punitive, academic, or financial decisions from the score.

Controls And Operations

Control	Production rule
Freshness	Azure Monitor alert when a Data Factory/Fabric pipeline misses SLA or an Event Hubs stream falls behind.
Completeness	Compare expected vs received enrollment and event counts by faculty and term in validation tables.
Identity	Track unmatched IDs and low-confidence matches; route exceptions to data stewardship.
Schema drift	Block breaking column, type, or semantic changes before silver promotion.
Score drift	Monitor score distribution, feature drift, and reason-code mix by term, faculty, programme, and study mode.
Governance	Register assets and lineage in Purview; version model, features, data, thresholds, reason-code logic, and approvals.
Security	Pseudonymous modeling, restricted PII, encryption, private access where needed, Entra groups, Key Vault secrets, and Power BI/Fabric row-level advisor access.
Audit	Store advisor access, data-product lineage, model release approval, and threshold approval in immutable audit tables.

Brief Coverage Map

Brief requirement	Where the solution addresses it
Four independent systems with different formats, update frequencies, APIs, files, and events.	Azure Data Factory/Fabric Data Factory, Event Hubs, source contracts, bronze ingestion metadata, schema drift controls, and per-source integration patterns.
Inconsistent identifiers across systems.	Canonical `identity_map` with source IDs, validity windows, and match confidence.
Student status changes over time.	Type-2 `student_status_history`, effective-dated finance snapshots, and point-in-time joins.
Predict before official records show the issue.	Weekly in-semester scoring, engagement/finance/campus features, and lead-time evaluation.
Train only on information available at prediction time.	`event_time <= t` and `available_at <= t` rule, feature snapshots, and closed label windows.
Diverse population and different engagement patterns.	Segmented baselines, protected-attribute audit, calibration checks, and reason-code distribution monitoring.
Explain why a student was flagged red.	Risk band, score range, top actionable reasons, source freshness, feature snapshot hash, and model version.
Show who accessed student data.	`access_audit` table with viewer, role, purpose, timestamp, fields returned, and prediction ID.
Keep working after launch.	Azure Monitor, Log Analytics, Purview, freshness, completeness, identity, schema drift, score drift, governance, security, rollback, and monitoring controls.
Make assumptions and trade-offs explicit.	Assumptions table and trade-off table, including batch vs streaming, interpretability, global/local model, and protected-attribute use.

Trade-Offs

Decision	Recommendation	Why
Batch vs streaming	Data Factory/Fabric daily batch plus Event Hubs capture for high-frequency streams	Good enough for advisor intervention while preserving the option for near-real-time signals later.
Model complexity	Interpretable first	Easier to defend, calibrate, and explain.
Global vs local model	One global model with context features	More stable at launch; local models need more data.
Raw vs aggregate exposure	Keep raw restricted, expose aggregates	Preserves audit while reducing privacy risk.
Protected attributes	Use for audit only	Needed to detect bias; not appropriate as reason codes.
Fabric vs custom app first	Power BI/Fabric advisor view first, custom workflow only if needed	Faster to govern with Entra groups, RLS, semantic models, and audit before building bespoke software.

Roadmap And Demo

Weeks 0-4: DPIA, lawful basis, Azure landing zone, residency decision, outcome definition, access model, data contracts.
Weeks 4-10: configure ADLS Gen2/Fabric workspaces, ingest SIS/LMS/ERP/campus, identity map, type-2 history.
Weeks 10-16: point-in-time Fabric feature products, labels, validation, first Azure ML model.
Weeks 16-22: Power BI/Fabric advisor pilot with two faculties, RLS, feedback capture, intervention outcomes.
Weeks 22-30: Responsible AI review, calibration, Azure Monitor alerts, Purview lineage, model registry, operational hardening.

Run the prototype:

python .\student_outcome_platform_demo.py --out outputs --students 1200 --seed 42

Key outputs: outputs/demo_dashboard.html, outputs/advisor_risk_list.csv, outputs/feature_student_week_and_predictions.csv, outputs/fairness_report.csv, outputs/model_validation_metrics.csv, and outputs/access_audit_sample.csv.