# Student Outcome Intelligence Platform

Concise case-study answer: build an Azure-based student-risk platform, not just a dropout model. The deliverable is a point-in-time, auditable advisor queue that turns SIS, LMS, ERP, and campus signals into explainable support actions.

This long-form solution mirrors the presentation deck chapter-for-chapter, with deeper background, alternatives considered, and the reasoning behind each choice. Every diagram, table, and formula in the slides reappears here, surrounded by the prose context that did not fit on the slide. The six chapters track the deck's bottom progress bar — Brief & thesis, Architecture, Data model, Model & math, Action & ethics, Deployment — and each subsection title matches a slide title in that chapter.

## Brief And Thesis

This chapter sets up the problem and the design stance. It restates what the university actually asked for, names the structural reasons the work is harder than a single ML model, declares the thesis (most of the value lives in the trustworthy data spine, not in model choice), and locks in the assumptions that make the target predictable, auditable, and safe to act on. Everything in the later chapters is a direct consequence of the four moves made here.

### Five Steps From Raw Signals To Advisor Action

Before the architecture, it is worth stating the end-to-end shape of the platform in five steps so each later chapter has a clear place in the pipeline.

| Step | What happens | Where it lives in this document |
| --- | --- | --- |
| Acquire | SIS, LMS, ERP, and campus signals are pulled by Azure Data Factory and Event Hubs into ADLS Gen2 bronze landings with full source metadata. | Architecture chapter. |
| Govern | Canonical student identity, type-2 history of mutable status, and Purview lineage and data contracts make the silver layer trustworthy. | Data model chapter. |
| Model | A calibrated, interpretable risk score with capacity-anchored bands and fairness checks runs in Azure ML. | Model & math chapter. |
| Deploy | Azure ML registry releases the model; Power BI / Fabric delivers the advisor queue with row-level security and reason codes. | Action & ethics chapter. |
| Operate | Drift, freshness, and fairness monitors plus an immutable audit log keep the score defensible across releases. | Deployment chapter. |

The five steps are deliberately sequential. Skipping a step earlier in the chain — for example, modelling before identity is canonical — does not save time, it just moves the cost into incident response after launch.

### The Brief Is A Platform Design Problem, Not A Model-Only Task

The university has 35,000 students across 8 faculties and wants to identify students at risk of dropping out before grades, withdrawal, or status records make the problem visible. The signals already exist, but they are split across four independent systems with different owners, schemas, and update rhythms.

| Source | Signal | Design implication |
| --- | --- | --- |
| Student Information System | Identity, enrolment, programme, grades, status, graduation, leave, withdrawal history. | Preserve effective-dated student history; never rely only on current state. |
| Learning Management System | Logins, course access, submissions, forums, video watch time, quizzes. | Normalize high-frequency behavior by course, week, and study mode. |
| ERP | Financial aid, tuition status, scholarships, balances, overdue payments. | Join through canonical identity and keep `available_at` timestamps. |
| Campus systems | Library access, WiFi presence, building entry. | Aggregate to privacy-aware engagement signals and calibrate by study mode. |

The deliverable described in the brief is not a binary classifier. It is a decision support system that has to be reproducible, explainable, fair, and operable inside a real advisor workflow with finite capacity. A model that cannot show its working, or that quietly leaks future information into training, is not just lower quality — it is unusable in a regulated context. That is why the chapter title from the deck deliberately frames this as a platform design problem.

### The Hard Part Is Trustworthy History, Not The Model

In the deck this slide reframes effort allocation: the industry expectation is roughly 20 percent data, 80 percent model; the production reality on a problem like this is the inverse. Identity resolution, type-2 history, point-in-time features, lineage, and contracts are where most of the design risk lives. The model itself is a small head sitting on top of that spine.

| Friction | What goes wrong | Design response |
| --- | --- | --- |
| Inconsistent IDs | The same student is `123` in SIS, `s.last@univ` in LMS, and `0001234` in ERP. Naive joins lose people. | Canonical `identity_map` with `match_confidence` and validity windows. |
| Different update rhythms | LMS streams hourly, finance posts weekly, status changes are entered by hand. Joining on "today" mixes truths from different clocks. | `event_time` and `available_at` on every row, plus weekly point-in-time snapshots. |
| Current-state source tables | SIS overwrites status when a student withdraws, so last term's "active" student now reads "withdrawn". History is silently destroyed. | Type-2 `student_status_history` materialized in silver before any feature is computed. |
| Advisor capacity is fixed | A score with no operating budget produces a queue no one works. | Bands sized by quantile of the score distribution against advisor headcount, not a fixed cutoff. |

The thesis has a sharp consequence: if the foundation is wrong, every model retrained on top of it inherits the same blind spots. Calibration drifts because the historical labels are wrong. Fairness audits show "no problem" because the protected-attribute history was already overwritten. Explainability collapses because a feature that *looked* like prior-term GPA was actually next-term GPA leaking back through a late update. The early chapters of the deck and this document are therefore disproportionately about boring infrastructure — that is where the leverage is.

### One-line Answer

Create a governed Microsoft Azure and Fabric data spine that resolves identity, stores mutable student records as history, builds leakage-safe weekly feature snapshots, predicts next-term non-continuation risk, and delivers risk bands plus reason codes to advisors.

### Scope The Target So The Model Can Be Defended

Six assumptions box the problem so that what gets built is small enough to defend and large enough to be useful. Each one shows up later in the design as a specific control.

| Area | Assumption | Design consequence |
| --- | --- | --- |
| Outcome | Risk means next eligible term non-continuation, excluding graduation, exchange, approved leave, and administrative corrections. | Labels need exclusions and a closed observation window. |
| Decision use | The score is advisor decision support only. | No automated punitive, academic, financial, or disciplinary action. |
| Timing | Advisors need enough lead time before final grades or official withdrawal. | Score weekly during the term after early engagement signals exist. |
| Privacy | The platform handles personal data and may be challenged. | Build DPIA, minimization, lineage, access audit, model cards, and explanation support from day one. |
| Fairness | Demographics are needed to detect bias but should not rank students. | Use protected attributes for audit only; never as advisor reason codes. |
| Capacity | Advisor headcount per faculty is the binding operational constraint. | Bands are sized by quantile against advisor capacity, not a detached score cutoff. |

Two of these are worth lingering on, because they generate most downstream behaviour. **Outcome scope** is what the deck assumption card calls "term-end attrition only — exclusions defined upfront." Without that exclusion list the model will learn that exchange semesters look like dropout, and the queue will fill with students who are abroad on programme. **Capacity-anchored bands** is the operational rule that the model serves a finite advisor team — the threshold lives downstream of staffing, not upstream of it. If next term's capacity drops, the same model still produces a workable queue by sliding τ_red right; if a fixed score cutoff is used instead, the queue silently overshoots the team.

The remaining four assumptions are guardrails against well-known failure modes. Treating the score as advisor decision support means a wrong prediction never directly harms a student — a human is always in the loop. Weekly cadence concentrates the score on the window where intervention is still useful (mid-term), not after grades are filed. The privacy posture is conservative on purpose: a DPIA, minimization, and immutable access audit are cheaper to build at week 0 than retrofitted at week 30. Fairness for audit only is the explicit answer to a common confusion — you cannot prove fairness without measuring outcomes by group, but those same attributes have no business ranking a student.

## Architecture

This chapter describes the Azure shape of the system: which services do which job, where data lives at each stage, what governs and audits the flow, and why a lakehouse with bronze, silver, and gold layers is the right backbone for an advisor decision-support tool that has to remain explainable and reproducible. The single architecture diagram from the deck reappears here, expanded into a service-by-service decision log so that each component can be defended individually.

### Azure Lakehouse Plus Governed ML Workflow

![Reference architecture diagram](architecture_reference.svg)

| Layer | Azure decision | Purpose | Output |
| --- | --- | --- | --- |
| Sources | SIS, LMS, ERP, campus systems | Source-owned files, APIs, database extracts, and events. | Raw operational evidence. |
| Ingestion | Azure Data Factory / Fabric Data Factory, Azure Event Hubs, managed identities. | Scheduled extraction, event capture, schema checks, freshness logging. | Landing records with source metadata. |
| Raw lake | ADLS Gen2, Event Hubs Capture, immutable folders. | Preserve original records exactly as received. | Replayable bronze evidence. |
| Curated lakehouse | Fabric OneLake Lakehouses with Delta tables. | Identity resolution, type-2 history, quality contracts, deduplication. | Trusted silver tables. |
| Feature products | Fabric SQL endpoint, notebooks, warehouse where useful. | Student-term facts, point-in-time weekly features, closed labels. | Gold training and scoring tables. |
| ML lifecycle | Azure ML registry, pipelines, Responsible AI dashboard. | Training, validation, calibration, fairness, model cards, thresholds. | Versioned model release package. |
| Advisor delivery | Power BI / Fabric app, Teams notification, row-level security. | Risk bands, reason codes, score trend, intervention workflow. | Audited support queue. |
| Control plane | Purview, Entra ID, Key Vault, Monitor, Log Analytics, Azure Policy. | Catalog, lineage, access, secrets, alerts, drift, approvals. | Evidence for operations and audit. |

The architecture is intentionally a lakehouse, not a classical data warehouse. The reason is the mix of structured records (SIS, ERP) and high-frequency, semi-structured event streams (LMS clicks, campus access). A warehouse-first design forces the streaming side into nightly extracts and loses event-level signal. A pure data-lake design makes governance, contracts, and SQL access painful. OneLake with Delta tables and a SQL endpoint sits between these, keeping bronze immutability for audit and silver/gold relationality for analytics.

A second deliberate choice is splitting the ingestion plane between Azure Data Factory and Azure Event Hubs. Data Factory handles the slow, scheduled, well-typed feeds (SIS daily extracts, ERP weekly postings); Event Hubs handles the high-frequency campus and LMS events. Both write to ADLS Gen2 in raw form before any transformation. That split is what allows a single feed to slow down (e.g., the SIS export runs late on a Sunday) without dragging the rest of the pipeline with it, and it lets the team retire one ingestion path without touching the other.

### Service Choices

| Need | Azure service | Why |
| --- | --- | --- |
| Mixed files, APIs, events, update frequencies. | Data Factory / Fabric Data Factory plus Event Hubs. | Supports batch extracts and high-frequency streams without forcing one ingestion pattern. |
| Replayable raw evidence. | ADLS Gen2 with lifecycle policies. | Keeps immutable source records for audit, backfill, and investigation. |
| Governed analytics layer. | Fabric OneLake lakehouses with bronze, silver, gold layers. | Separates raw evidence, trusted history, and publishable feature products. |
| Lineage and discovery. | Microsoft Purview. | Makes source, transformation, and data-product lineage searchable. |
| Model release and review. | Azure ML plus Responsible AI dashboard. | Versions model, thresholds, fairness review, and explanation artifacts together. |
| Advisor access. | Power BI / Fabric app with Entra groups and RLS. | Lets advisors see only assigned students or faculties. |

Every feed lands with `source_system`, `source_record_id`, `event_time`, `available_at`, `ingested_at`, schema version, source URI, pipeline run ID, and Purview asset reference. Breaking schema changes are quarantined before silver promotion. This metadata is what makes a prediction reproducible six months later — without it, "what did the system know on the day this score was generated" becomes impossible to answer, which is the same as having no audit at all.

A few alternatives were rejected on purpose. Synapse dedicated SQL pools were considered for the gold layer but added a second compute engine, a second governance surface, and a hard split between batch and lakehouse data; Fabric SQL endpoint over the same Delta tables removes that split. Databricks was considered for ingestion and feature engineering but would have duplicated tooling that Fabric already provides natively, and the Purview lineage story is cleaner end-to-end inside Azure-first services. Push-only event ingestion from each source was considered and rejected because not all source systems can push reliably — Data Factory's pull model survives source outages without losing events.

## Data Model

This chapter is the operational heart of the platform: how raw payloads from four very different source systems become a trustworthy, point-in-time feature table that a model can be trained on without leakage. It walks through the bronze→silver lineage, the silver→gold feature products, the catalogue of tables that hold history, and the explicit admission rule that keeps future and late-arriving records out of the snapshot. If any single chapter has to be right for the rest of the platform to work, it is this one.

### Bronze To Silver: Raw Payloads To Cleansed Tables

![Bronze to Silver lineage](bronze_silver_lineage.svg)

The bronze layer is intentionally close to dumb. Files and events land in ADLS Gen2 partitioned by source, ingestion date, and pipeline run, with the original payload preserved. Nothing in bronze is corrected, deduplicated, or joined. The reason is that audit, backfill, and investigation all depend on being able to replay the exact bytes the platform received. When a number on a Power BI tile is questioned six months later, the bronze layer is what proves what the source actually said on that day.

The promotion to silver is where most of the data engineering work lives. Each source gets a staging model that handles three jobs: cleansing (typing, null handling, encoding), conforming (mapping into the canonical column names and units used downstream), and contract enforcement (Great Expectations or equivalent rules block obviously broken loads from reaching silver). For mutable entities, especially student status, programme, and faculty, silver materialises a Type-2 slowly-changing-dimension with `valid_from`, `valid_to`, and `recorded_at`. For event streams, silver collapses obvious duplicates and applies watermarking so a late-arriving event does not silently rewrite history.

The end state of silver is a small set of trusted, conformed tables that everything downstream is allowed to read. Anything that needs raw bronze data has to do so through an explicit, audited path — by design, an analyst building a feature in the gold layer should never need to touch bronze.

### Silver To Gold: Features And Scored Predictions

![Silver to Gold lineage](data_model_lineage.svg)

| Table | Key columns | Role |
| --- | --- | --- |
| `identity_map` | `canonical_student_id`, `source_system`, `source_person_id`, `valid_from`, `valid_to`, `match_confidence` | Joins fragmented systems safely. |
| `student_status_history` | `status`, `faculty`, `programme`, `valid_from`, `valid_to`, `recorded_at` | Prevents current-state leakage. |
| `ingestion_audit` | `pipeline_run_id`, `source_system`, `schema_version`, `record_count`, `watermark`, `purview_asset_id` | Links data products back to pipeline evidence. |
| `fact_enrollment_term` | `term_id`, `credits_registered`, `prior_gpa`, `academic_standing` | Academic baseline. |
| `fact_lms_activity_daily` | `activity_date`, `login_count`, `course_views`, `assignment_due`, `assignment_submitted` | Digital engagement. |
| `fact_financial_snapshot` | `status_date`, `outstanding_balance_nok`, `payment_overdue_days`, `aid_status` | Financial friction. |
| `fact_campus_activity_daily` | `event_date`, `building_entry_count`, `library_entry_count`, `wifi_minutes` | Aggregated physical engagement. |
| `feature_student_week` | `as_of_date`, `term_week`, feature columns, `feature_snapshot_hash` | Model input. |
| `risk_prediction` | `prediction_id`, `model_version`, `risk_score`, `risk_band`, `top_reasons` | Advisor output. |
| `access_audit` | `viewer_user_id`, `purpose`, `timestamp`, `prediction_id`, `fields_returned` | Accountability. |

`feature_student_week` is the table that the model actually consumes, and it deserves a closer look. Each row is keyed on `(canonical_student_id, as_of_date)` and is built by joining the silver fact tables through `identity_map`, filtered by the point-in-time admission rule below. The `feature_snapshot_hash` column is the cheapest reproducibility tool in the design: it is a stable hash of the feature vector that ends up in the prediction record, so a future audit can confirm that the prediction it reproduces from the data spine matches the prediction the advisor saw.

`risk_prediction` is the publication boundary. Once a row lands here, it is immutable. New predictions for the same student-week create a new row keyed by `prediction_id` and `model_version`. This append-only stance is what lets the platform answer questions like "which model version produced the score this advisor saw, and why is today's score different" without ambiguity.

### Every Row Has An As-Of Date

Every row in `feature_student_week` is built as of `t = as_of_date`. The point-in-time admission rule is the single most important control in the platform — it is what turns a pile of source records into a trustworthy training set.

<div class="equation-stack">
  <section class="equation-panel">
    <h3>Historical Availability</h3>
    <div class="math-expr"><span class="math-fn">usable</span>(record,&nbsp;t)<span class="math-op">=</span>1[<span class="math-var-up">event_time</span><span class="math-op">&le;</span>t<span class="math-op">&and;</span><span class="math-var-up">available_at</span><span class="math-op">&le;</span>t]</div>
    <p>Future records and late-arriving records are blocked during both training and scoring.</p>
  </section>
  <section class="equation-panel">
    <h3>Digital Engagement</h3>
    <div class="math-expr"><span class="math-var-up">lms_logins_14d</span>(i,t)<span class="math-op">=</span>&sum;<span class="math-var-up">login_count</span><sub>i,d</sub><span class="math-op">for</span>d<span class="math-op">&isin;</span>[t<span class="math-op">&minus;</span>13,&nbsp;t]</div>
    <p>Recent LMS activity is measured only from dates already visible at the scoring date.</p>
  </section>
  <section class="equation-panel">
    <h3>Assessment Gap</h3>
    <div class="math-expr"><span class="math-var-up">missing_assignments</span>(i,t)<span class="math-op">=</span><span class="math-fn">max</span>(<span class="math-var-up">due_to_date</span><sub>i,t</sub><span class="math-op">&minus;</span><span class="math-var-up">submitted_to_date</span><sub>i,t</sub>,&nbsp;0)</div>
    <p>The feature compares due work and submitted work as of the snapshot date.</p>
  </section>
  <section class="equation-panel">
    <h3>Financial Signal</h3>
    <div class="math-expr"><span class="math-var-up">overdue_flag</span>(i,t)<span class="math-op">=</span>1[<span class="math-var-up">payment_overdue_days</span><sub>i,t</sub><span class="math-op">&gt;</span>0]</div>
    <p>Finance data is joined through effective-dated snapshots and canonical identity.</p>
  </section>
</div>

<figure class="diagram-figure" aria-labelledby="leakageDiagramTitle">
  <figcaption id="leakageDiagramTitle">Point-in-time admission rule for source records<span class="figure-tag">Diagram 1</span></figcaption>
  <div class="figure-scroll">
  <svg viewBox="0 0 1000 380" role="img" aria-label="Timeline showing which source records may enter a feature snapshot built at scoring date t. Records whose event time and availability time both fall before t are admitted; late-arriving records and future records are blocked.">
    <defs>
      <pattern id="lkBlocked" width="6" height="6" patternUnits="userSpaceOnUse" patternTransform="rotate(45)"><line x1="0" y1="0" x2="0" y2="6" stroke="#8a3a2c" stroke-width="1.6" opacity=".55"/></pattern>
    </defs>
    <rect x="0" y="0" width="1000" height="380" fill="#ffffff"/>
    <rect x="240" y="60" width="384" height="240" fill="#f5f7f9" opacity=".6"/>
    <rect x="624" y="60" width="192" height="240" fill="url(#lkBlocked)" opacity=".22"/>
    <text x="246" y="78" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#0d6e58" letter-spacing="1.1">ADMISSIBLE &#8901; event_time &#8804; t  AND  available_at &#8804; t</text>
    <text x="630" y="78" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#8a3a2c" letter-spacing="1.1">BLOCKED &#8901; FUTURE OR LATE</text>
    <line x1="240" y1="284" x2="816" y2="284" stroke="#1a2332" stroke-width="1.4"/>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11.5" fill="#5b6675" font-weight="700">
      <line x1="240" y1="280" x2="240" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="240" y="306" text-anchor="middle">t &#8722; 4w</text>
      <line x1="336" y1="280" x2="336" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="336" y="306" text-anchor="middle">t &#8722; 3w</text>
      <line x1="432" y1="280" x2="432" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="432" y="306" text-anchor="middle">t &#8722; 2w</text>
      <line x1="528" y1="280" x2="528" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="528" y="306" text-anchor="middle">t &#8722; 1w</text>
      <line x1="720" y1="280" x2="720" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="720" y="306" text-anchor="middle">t + 1w</text>
      <line x1="816" y1="280" x2="816" y2="290" stroke="#5b6675" stroke-width="1.2"/><text x="816" y="306" text-anchor="middle">t + 2w</text>
    </g>
    <line x1="624" y1="38" x2="624" y2="294" stroke="#1a2332" stroke-width="2.4" stroke-dasharray="6 5"/>
    <rect x="560" y="20" width="128" height="22" rx="11" fill="#1a2332"/>
    <text x="624" y="35" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="12" font-weight="850" fill="#ffffff">scoring date t</text>
    <g font-family="Segoe UI, Arial, sans-serif">
      <text x="48" y="124" font-size="13" font-weight="850" fill="#1a2332">Prior-term GPA</text>
      <line x1="240" y1="124" x2="330" y2="124" stroke="#0d6e58" stroke-width="3"/>
      <circle cx="240" cy="124" r="6" fill="#0d6e58"/>
      <polygon points="330,118 344,124 330,130" fill="#0d6e58"/>
      <text x="240" y="110" font-size="10" font-weight="700" fill="#5b6675">event_time</text>
      <text x="344" y="110" font-size="10" font-weight="700" fill="#5b6675">available_at</text>
      <rect x="848" y="113" width="132" height="22" rx="11" fill="#0d6e58"/>
      <text x="914" y="128" text-anchor="middle" font-size="11" font-weight="850" fill="#ffffff">USABLE</text>
      <text x="48" y="170" font-size="13" font-weight="850" fill="#1a2332">Tuition payment</text>
      <line x1="528" y1="170" x2="582" y2="170" stroke="#0d6e58" stroke-width="3"/>
      <circle cx="528" cy="170" r="6" fill="#0d6e58"/>
      <polygon points="582,164 596,170 582,176" fill="#0d6e58"/>
      <text x="528" y="156" font-size="10" font-weight="700" fill="#5b6675">event_time</text>
      <text x="596" y="156" font-size="10" font-weight="700" fill="#5b6675">available_at</text>
      <rect x="848" y="159" width="132" height="22" rx="11" fill="#0d6e58"/>
      <text x="914" y="174" text-anchor="middle" font-size="11" font-weight="850" fill="#ffffff">USABLE</text>
      <text x="48" y="216" font-size="13" font-weight="850" fill="#1a2332">Late-recorded grade</text>
      <line x1="432" y1="216" x2="676" y2="216" stroke="#a86317" stroke-width="3" stroke-dasharray="5 4"/>
      <circle cx="432" cy="216" r="6" fill="#a86317"/>
      <polygon points="676,210 690,216 676,222" fill="#a86317"/>
      <text x="432" y="202" font-size="10" font-weight="700" fill="#5b6675">event_time</text>
      <text x="690" y="202" font-size="10" font-weight="700" fill="#5b6675">available_at</text>
      <rect x="848" y="205" width="132" height="22" rx="11" fill="#a86317"/>
      <text x="914" y="220" text-anchor="middle" font-size="11" font-weight="850" fill="#ffffff">BLOCKED &#8901; LATE</text>
      <text x="48" y="262" font-size="13" font-weight="850" fill="#1a2332">Final exam grade</text>
      <line x1="720" y1="262" x2="800" y2="262" stroke="#8a3a2c" stroke-width="3" stroke-dasharray="5 4"/>
      <circle cx="720" cy="262" r="6" fill="#8a3a2c"/>
      <polygon points="800,256 814,262 800,268" fill="#8a3a2c"/>
      <text x="720" y="248" font-size="10" font-weight="700" fill="#5b6675">event_time</text>
      <text x="814" y="248" font-size="10" font-weight="700" fill="#5b6675">available_at</text>
      <rect x="848" y="251" width="132" height="22" rx="11" fill="#8a3a2c"/>
      <text x="914" y="266" text-anchor="middle" font-size="11" font-weight="850" fill="#ffffff">BLOCKED &#8901; FUTURE</text>
    </g>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="700" fill="#5b6675">
      <circle cx="60" cy="354" r="5" fill="#1a2332"/><text x="74" y="358">event_time</text>
      <polygon points="184,348 198,354 184,360" fill="#1a2332"/><text x="206" y="358">available_at</text>
      <rect x="330" y="348" width="14" height="12" fill="#0d6e58"/><text x="352" y="358">admitted</text>
      <rect x="438" y="348" width="14" height="12" fill="#a86317"/><text x="460" y="358">late-arriving (blocked)</text>
      <rect x="630" y="348" width="14" height="12" fill="#8a3a2c"/><text x="652" y="358">future (blocked)</text>
    </g>
  </svg>
  </div>
  <p class="figure-note">A record enters the feature snapshot only if the event happened at or before <em>t</em> and was visible to the platform at or before <em>t</em>. The late-recorded grade is real history but its <code>available_at</code> is in the future, so it is rejected by the second predicate. Reproducing a prediction means rebuilding both clocks at <em>t</em>, not just the calendar one.</p>
</figure>

The two-clock rule (`event_time <= t` AND `available_at <= t`) is the difference between a leakage-safe pipeline and one that quietly cheats. The first predicate keeps future records out — easy. The second predicate keeps records out that are real history but were not yet visible to the platform at `t`, e.g., a grade entered three weeks late by a faculty office. A naive design would let those records into a training row dated `t` because their `event_time` is before `t`; the model would then learn a feature that is impossible to compute at scoring time. The two-clock rule makes that impossible.

Labels are kept separate from features until the outcome window closes. Each feature snapshot is stored with its hash so predictions can be reproduced exactly. This is the single hardest discipline to maintain in practice — the temptation to "just add one more recent variable" to features is constant — and the design enforces it through table-level separation rather than relying on reviewer attention.

## Model And Math

This chapter is about the model itself, but the framing is deliberate: most of the design effort already happened upstream. The model is a small head sitting on top of a trustworthy data spine, and its job is to be calibrated, defensible, and easy to recalibrate when capacity or population changes. The chapter covers the formula spine the platform has to defend, the choice of an interpretable baseline, the capacity-anchored thresholds that turn a probability into a queue, and the validation metrics that tell you whether the queue is actually working.

### The Math The Platform Has To Defend

Start with calibrated logistic regression or explainable gradient boosted trees in Azure ML. The goal is useful advisor prioritization, not only aggregate accuracy.

<div class="equation-stack">
  <section class="equation-panel">
    <h3>Risk Score</h3>
    <div class="math-expr">r<sub>i,t</sub><span class="math-op">=</span><span class="math-fn">P</span>(Y<sub>i,t+h</sub><span class="math-op">=</span>1<span class="math-op">|</span>X<sub>i,t</sub>)</div>
    <p>Risk is the probability of next-term non-continuation using only features available at scoring date <span class="math-note">t</span>.</p>
  </section>
  <section class="equation-panel">
    <h3>Interpretable Baseline</h3>
    <div class="equation-lines">
      <div class="equation-line"><span class="indent"></span><span><span class="math-fn">logit</span>(r<sub>i,t</sub>)<span class="math-op">=</span>&beta;<sub>0</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>1</sub>&nbsp;<span class="math-var-up">prior_gpa</span><sub>i,t</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>2</sub>&nbsp;<span class="math-var-up">missing_assignments</span><sub>i,t</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>3</sub>&nbsp;<span class="math-var-up">days_since_lms</span><sub>i,t</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>4</sub>&nbsp;<span class="math-var-up">payment_overdue_days</span><sub>i,t</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>5</sub>&nbsp;<span class="math-var-up">campus_drop</span><sub>i,t</sub></span></div>
      <div class="equation-line"><span class="indent">+</span><span>&beta;<sub>6..k</sub>&nbsp;<span class="math-var-up">academic_context</span><sub>i,t</sub></span></div>
    </div>
  </section>
  <section class="equation-panel">
    <h3>Capacity Threshold</h3>
    <div class="equation-lines">
      <div class="equation-line"><span class="indent"></span><span>&tau;<sub>red</sub><span class="math-op">=</span><span class="math-fn">quantile</span>(r<sub>i,t</sub>,&nbsp;1<span class="math-op">&minus;</span>C<sub>red</sub>&#8202;/&#8202;N)</span></div>
      <div class="equation-line"><span class="indent"></span><span>&tau;<sub>amber</sub><span class="math-op">=</span><span class="math-fn">quantile</span>(r<sub>i,t</sub>,&nbsp;1<span class="math-op">&minus;</span>(C<sub>red</sub>&nbsp;+&nbsp;C<sub>amber</sub>)&#8202;/&#8202;N)</span></div>
    </div>
    <p>Thresholds are tied to real advisor capacity rather than a detached score cutoff.</p>
  </section>
</div>

Calibrated logistic regression is the recommended baseline because every coefficient maps directly to a reason code. When β₃ on `days_since_lms` is positive and material, "you have not signed in for 11 days" can appear in the advisor's UI as an honest explanation. With a tree-ensemble black box, the same statement is at best an approximation, and at worst a post-hoc justification. Gradient boosted trees with monotonic constraints and SHAP explanations are an acceptable second choice when the baseline plateaus; deep models are not used because the marginal AUC gain does not pay for the explainability and audit cost.

Calibration matters as much as discrimination here. A score of 0.7 has to actually mean "70 percent of these students will not continue," because otherwise capacity-anchored bands lose their meaning and reason codes mislead advisors. Platt scaling or isotonic regression on a held-out term takes calibration from "approximately right" to "audit-defensible." Expected Calibration Error (ECE) is reported alongside Recall@C in the validation table for exactly this reason.

### Start Interpretable, Calibrated, And Capacity-Aware

<figure class="diagram-figure" aria-labelledby="capacityDiagramTitle">
  <figcaption id="capacityDiagramTitle">Capacity-anchored thresholds on the score distribution<span class="figure-tag">Diagram 2</span></figcaption>
  <div class="figure-scroll">
  <svg viewBox="0 0 1000 380" role="img" aria-label="Histogram of risk scores from 0 to 1, showing how the red and amber thresholds are placed at quantiles tied to advisor capacity. The red band is the top 220 students, the amber band is the next 380 students, the green band holds the remaining cohort.">
    <defs>
      <linearGradient id="capRed" x1="0" x2="0" y1="0" y2="1"><stop offset="0%" stop-color="#8a3a2c" stop-opacity=".18"/><stop offset="100%" stop-color="#8a3a2c" stop-opacity=".05"/></linearGradient>
      <linearGradient id="capAmber" x1="0" x2="0" y1="0" y2="1"><stop offset="0%" stop-color="#a86317" stop-opacity=".16"/><stop offset="100%" stop-color="#a86317" stop-opacity=".04"/></linearGradient>
      <linearGradient id="capGreen" x1="0" x2="0" y1="0" y2="1"><stop offset="0%" stop-color="#0d6e58" stop-opacity=".14"/><stop offset="100%" stop-color="#0d6e58" stop-opacity=".03"/></linearGradient>
    </defs>
    <rect x="0" y="0" width="1000" height="380" fill="#ffffff"/>
    <text x="60" y="22" font-family="Segoe UI, Arial, sans-serif" font-size="12" font-weight="850" fill="#0d6e58" letter-spacing="1.2">SCORE DISTRIBUTION ACROSS N STUDENTS &#8901; THRESHOLDS PLACED BY QUANTILE</text>
    <rect x="565" y="34" width="142" height="22" rx="11" fill="#a86317"/>
    <text x="636" y="49" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="850" fill="#ffffff">&#964; amber &#8776; 0.62</text>
    <rect x="755" y="34" width="142" height="22" rx="11" fill="#8a3a2c"/>
    <text x="826" y="49" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="850" fill="#ffffff">&#964; red &#8776; 0.82</text>
    <rect x="80"  y="70" width="555" height="210" fill="url(#capGreen)"/>
    <rect x="635" y="70" width="190" height="210" fill="url(#capAmber)"/>
    <rect x="825" y="70" width="95"  height="210" fill="url(#capRed)"/>
    <g fill="#1a2332">
      <rect x="84"  y="271" width="22" height="9"/>
      <rect x="108" y="263" width="22" height="17"/>
      <rect x="132" y="249" width="22" height="31"/>
      <rect x="156" y="227" width="22" height="53"/>
      <rect x="180" y="201" width="22" height="79"/>
      <rect x="204" y="171" width="22" height="109"/>
      <rect x="228" y="139" width="22" height="141"/>
      <rect x="252" y="113" width="22" height="167"/>
      <rect x="276" y="93"  width="22" height="187"/>
      <rect x="300" y="81"  width="22" height="199"/>
      <rect x="324" y="79"  width="22" height="201"/>
      <rect x="348" y="87"  width="22" height="193"/>
      <rect x="372" y="101" width="22" height="179"/>
      <rect x="396" y="121" width="22" height="159"/>
      <rect x="420" y="147" width="22" height="133"/>
      <rect x="444" y="173" width="22" height="107"/>
      <rect x="468" y="195" width="22" height="85"/>
      <rect x="492" y="211" width="22" height="69"/>
      <rect x="516" y="223" width="22" height="57"/>
      <rect x="540" y="233" width="22" height="47"/>
      <rect x="564" y="241" width="22" height="39"/>
      <rect x="588" y="247" width="22" height="33"/>
      <rect x="612" y="253" width="22" height="27"/>
      <rect x="636" y="257" width="22" height="23"/>
      <rect x="660" y="259" width="22" height="21"/>
      <rect x="684" y="261" width="22" height="19"/>
      <rect x="708" y="262" width="22" height="18"/>
      <rect x="732" y="263" width="22" height="17"/>
      <rect x="756" y="264" width="22" height="16"/>
      <rect x="780" y="264" width="22" height="16"/>
      <rect x="804" y="265" width="22" height="15"/>
      <rect x="828" y="266" width="22" height="14"/>
      <rect x="852" y="266" width="22" height="14"/>
      <rect x="876" y="267" width="22" height="13"/>
      <rect x="900" y="267" width="20" height="13"/>
    </g>
    <line x1="80" y1="280" x2="920" y2="280" stroke="#1a2332" stroke-width="1.6"/>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="700" fill="#5b6675">
      <text x="80"  y="296" text-anchor="middle">0.0</text>
      <text x="248" y="296" text-anchor="middle">0.2</text>
      <text x="416" y="296" text-anchor="middle">0.4</text>
      <text x="584" y="296" text-anchor="middle">0.6</text>
      <text x="752" y="296" text-anchor="middle">0.8</text>
      <text x="920" y="296" text-anchor="middle">1.0</text>
      <text x="500" y="316" text-anchor="middle" font-size="12.5" fill="#1a2332" font-weight="850">risk score r&#8336;,&#8348;</text>
    </g>
    <line x1="635" y1="60" x2="635" y2="296" stroke="#a86317" stroke-width="2.2" stroke-dasharray="6 4"/>
    <line x1="825" y1="60" x2="825" y2="296" stroke="#8a3a2c" stroke-width="2.4" stroke-dasharray="6 4"/>
    <g font-family="Segoe UI, Arial, sans-serif">
      <line x1="357" y1="324" x2="357" y2="338" stroke="#0d6e58" stroke-width="2"/>
      <text x="357" y="356" text-anchor="middle" font-size="12.5" font-weight="850" fill="#0d6e58">GREEN &#8901; monitor</text>
      <text x="357" y="372" text-anchor="middle" font-size="11" font-weight="700" fill="#5b6675">remaining cohort</text>
      <line x1="730" y1="324" x2="730" y2="338" stroke="#a86317" stroke-width="2"/>
      <text x="730" y="356" text-anchor="middle" font-size="12.5" font-weight="850" fill="#a86317">AMBER &#8901; outreach</text>
      <text x="730" y="372" text-anchor="middle" font-size="11" font-weight="700" fill="#5b6675">next 380 students</text>
      <line x1="872" y1="324" x2="872" y2="338" stroke="#8a3a2c" stroke-width="2"/>
      <text x="872" y="356" text-anchor="middle" font-size="12.5" font-weight="850" fill="#8a3a2c">RED &#8901; advisor</text>
      <text x="872" y="372" text-anchor="middle" font-size="11" font-weight="700" fill="#5b6675">top 220 students</text>
      <text x="60" y="356" font-size="11.5" font-weight="800" fill="#5b6675" letter-spacing=".5">N &#8776; 35,000</text>
    </g>
  </svg>
  </div>
  <p class="figure-note">Bands are sized to advisor reality, not to a detached score cutoff. If next term&#39;s capacity drops, the same model produces a smaller red band by sliding &#964;<sub>red</sub> right; if you raise the bar, the model never silently overshoots staffing.</p>
</figure>

The capacity-anchored design is what stops the queue from being noise. A common failure mode of risk models is to ship a "0.5 cutoff" by default; in this domain that produces queues of several thousand students that no advisor team can work, the queue gets ignored, and the platform quietly dies. By tying τ_red to the top C_red students by quantile, the queue is always exactly the size the team can act on. The cost is that absolute risk levels can drift between terms — a "red" student in a calmer term is genuinely lower-risk than a "red" student in a hard term — but that drift is visible in the score itself and can be reported alongside the band.

| Check | Formula | Why it matters |
| --- | --- | --- |
| Precision at capacity | `Precision@C = true_positives_in_top_C / C` | Are advisor slots used well? |
| Recall at capacity | `Recall@C = true_positives_in_top_C / all_actual_positives` | How much actual risk does the queue catch? |
| Lead time | `lead_time_i = outcome_date_i - first_red_or_amber_score_date_i` | Is the signal early enough to act? |
| Calibration | `ECE = sum_b (n_b / N) * abs(mean(Y_b) - mean(r_b))` | Do predicted probabilities mean what they claim? |
| Fairness gap | `gap = max(metric_g) - min(metric_g)` | Detect material group differences in recall, FPR, FNR, ECE, and flag rate. |

#### Confusion matrix at &#964;<sub>red</sub>

The threshold is set so the at-risk count fits advisor capacity. Recall and lead time are reported alongside; precision is informative, not the optimization target.

| | Predicted no-risk | Predicted at-risk |
| --- | --- | --- |
| **Actual no-risk** | TN &#8776; 78% | FP &#8776; 5% |
| **Actual at-risk** | FN &#8776; 8% | TP &#8776; 9% |

Cells are illustrative pilot-term shares of the 35,000-student population. The model card publishes the same matrix split by faculty, gender, age band, international status, and first-generation status. Any group whose FN rate diverges materially from the population is flagged for review before release.

The reason precision is "informative, not the optimization target" is operational. Once the queue is sized to advisor capacity, precision is bounded by base rate; chasing precision means under-flagging and missing students who should have been called. Recall@C and lead time tell the operationally honest story: of the students who actually did not continue, what share landed in the queue, and how many weeks ahead of the outcome did the queue first surface them?

## Action And Ethics

A score that reaches an advisor without context is just noise; a score that reaches an advisor with the wrong group hidden in its error structure is harm. This chapter covers the two halves of how the platform makes itself accountable: the advisor-facing UI that turns a probability into a decision, and the fairness audit that runs every model release before any prediction reaches a human. Both are explicit answers to the assumption that the model is decision support only, never autonomous action.

### Make The Prediction Actionable

The advisor view is intentionally narrow. It shows only what an advisor needs to make a defensible decision about a specific student in the next two weeks, and nothing else. The deck demonstration screen has three rows — green, amber, red — each with the student, programme, the top reasons, and a recommended support route.

- Student, programme, risk band, score range, and score trend.
- Top three actionable reasons with source freshness.
- Recommended support route: academic check-in, financial-aid referral, study-skills support, or wellbeing referral.
- Contact status, notes, intervention outcome, and "not relevant" feedback.
- Access evidence showing who viewed which prediction, when, and for what purpose.

Two design choices in the advisor view are worth defending explicitly. First, **reason codes are bounded to actionable signals**. The model may use 30+ features, but the advisor only sees the three that drove the score most for this student-week, and only ever from the actionable set (assignments, LMS activity, finance, campus). Demographic features — even when they carry signal — are never shown as reason codes because they are not legitimate grounds for a support call. Second, **score trend matters more than the absolute score**. A student whose risk has climbed two bands in three weeks is operationally more interesting than a student who has been red and stable; the UI surfaces the trend prominently for that reason.

The "not relevant" feedback channel and intervention-outcome capture are the unglamorous half of the workflow but they are what closes the loop. Advisor labels become the next training signal, both for the model (was this prediction useful?) and for fairness monitoring (is the queue under-serving a particular group?). Without that channel, the platform is write-only and gets stale fast.

### Engagement Patterns Differ For Legitimate Reasons

<figure class="diagram-figure" aria-labelledby="fairnessDiagramTitle">
  <figcaption id="fairnessDiagramTitle">Recall@C and calibration gap by audit group<span class="figure-tag">Diagram 3</span></figcaption>
  <div class="figure-scroll">
  <svg viewBox="0 0 1000 380" role="img" aria-label="Dot plot showing Recall at capacity by group. Each row is a group with a recall point and a thin confidence whisker. A vertical line marks the population recall and a bracket marks max minus min, the fairness gap.">
    <rect x="0" y="0" width="1000" height="380" fill="#ffffff"/>
    <text x="40" y="22" font-family="Segoe UI, Arial, sans-serif" font-size="12" font-weight="850" fill="#0d6e58" letter-spacing="1.2">RECALL@C BY AUDIT GROUP &#8901; PILOT TERM, N = 35,000</text>
    <text x="950" y="46" text-anchor="end" font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="850" fill="#1a2332">population mean 0.68</text>
    <g fill="#f5f7f9">
      <rect x="40" y="68"  width="920" height="36"/>
      <rect x="40" y="140" width="920" height="36"/>
      <rect x="40" y="212" width="920" height="36"/>
    </g>
    <g stroke="#e6e8ec" stroke-width="1" stroke-dasharray="2 4">
      <line x1="320" y1="62" x2="320" y2="296"/>
      <line x1="450" y1="62" x2="450" y2="296"/>
      <line x1="580" y1="62" x2="580" y2="296"/>
      <line x1="710" y1="62" x2="710" y2="296"/>
      <line x1="840" y1="62" x2="840" y2="296"/>
    </g>
    <line x1="684" y1="50" x2="684" y2="296" stroke="#1a2332" stroke-width="1.6" stroke-dasharray="4 4"/>
    <g font-family="Segoe UI, Arial, sans-serif">
      <text x="52" y="92" font-size="13" font-weight="850" fill="#1a2332">Population</text>
      <line x1="684" y1="80" x2="684" y2="92" stroke="#1a2332" stroke-width="2.4"/>
      <circle cx="684" cy="86" r="7" fill="#1a2332"/>
      <text x="942" y="92" text-anchor="end" font-size="13" font-weight="850" fill="#1a2332">0.68</text>
      <text x="52" y="128" font-size="13" font-weight="800" fill="#1a2332">Faculty A &#8901; Engineering</text>
      <line x1="697" y1="122" x2="749" y2="122" stroke="#0d6e58" stroke-width="2.2"/>
      <line x1="697" y1="116" x2="697" y2="128" stroke="#0d6e58" stroke-width="2"/>
      <line x1="749" y1="116" x2="749" y2="128" stroke="#0d6e58" stroke-width="2"/>
      <circle cx="723" cy="122" r="6" fill="#0d6e58"/>
      <text x="942" y="128" text-anchor="end" font-size="13" font-weight="850" fill="#0d6e58">0.71</text>
      <text x="52" y="164" font-size="13" font-weight="800" fill="#1a2332">Faculty B &#8901; Humanities</text>
      <line x1="632" y1="158" x2="684" y2="158" stroke="#0d6e58" stroke-width="2.2"/>
      <line x1="632" y1="152" x2="632" y2="164" stroke="#0d6e58" stroke-width="2"/>
      <line x1="684" y1="152" x2="684" y2="164" stroke="#0d6e58" stroke-width="2"/>
      <circle cx="658" cy="158" r="6" fill="#0d6e58"/>
      <text x="942" y="164" text-anchor="end" font-size="13" font-weight="850" fill="#0d6e58">0.66</text>
      <text x="52" y="200" font-size="13" font-weight="800" fill="#1a2332">Female</text>
      <line x1="684" y1="194" x2="736" y2="194" stroke="#0d6e58" stroke-width="2.2"/>
      <line x1="684" y1="188" x2="684" y2="200" stroke="#0d6e58" stroke-width="2"/>
      <line x1="736" y1="188" x2="736" y2="200" stroke="#0d6e58" stroke-width="2"/>
      <circle cx="710" cy="194" r="6" fill="#0d6e58"/>
      <text x="942" y="200" text-anchor="end" font-size="13" font-weight="850" fill="#0d6e58">0.70</text>
      <text x="52" y="236" font-size="13" font-weight="800" fill="#1a2332">International students</text>
      <line x1="528" y1="230" x2="580" y2="230" stroke="#a86317" stroke-width="2.6"/>
      <line x1="528" y1="224" x2="528" y2="236" stroke="#a86317" stroke-width="2"/>
      <line x1="580" y1="224" x2="580" y2="236" stroke="#a86317" stroke-width="2"/>
      <circle cx="554" cy="230" r="7" fill="#a86317"/>
      <text x="942" y="236" text-anchor="end" font-size="13" font-weight="850" fill="#a86317">0.58</text>
      <text x="52" y="272" font-size="13" font-weight="800" fill="#1a2332">First-generation</text>
      <line x1="619" y1="266" x2="671" y2="266" stroke="#0d6e58" stroke-width="2.2"/>
      <line x1="619" y1="260" x2="619" y2="272" stroke="#0d6e58" stroke-width="2"/>
      <line x1="671" y1="260" x2="671" y2="272" stroke="#0d6e58" stroke-width="2"/>
      <circle cx="645" cy="266" r="6" fill="#0d6e58"/>
      <text x="942" y="272" text-anchor="end" font-size="13" font-weight="850" fill="#0d6e58">0.65</text>
    </g>
    <line x1="320" y1="296" x2="840" y2="296" stroke="#1a2332" stroke-width="1.4"/>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="700" fill="#5b6675">
      <text x="320" y="312" text-anchor="middle">0.40</text>
      <text x="450" y="312" text-anchor="middle">0.50</text>
      <text x="580" y="312" text-anchor="middle">0.60</text>
      <text x="710" y="312" text-anchor="middle">0.70</text>
      <text x="840" y="312" text-anchor="middle">0.80</text>
    </g>
    <g stroke="#8a3a2c" stroke-width="2.2" fill="none">
      <line x1="554" y1="332" x2="723" y2="332"/>
      <line x1="554" y1="326" x2="554" y2="338"/>
      <line x1="723" y1="326" x2="723" y2="338"/>
    </g>
    <rect x="568" y="342" width="142" height="22" rx="11" fill="#8a3a2c"/>
    <text x="639" y="357" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="12" font-weight="850" fill="#ffffff">gap = 0.13 &#8901; review</text>
    <text x="580" y="372" font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="850" fill="#1a2332">Recall@C &#8901; share of true non-continuers in the advisor queue</text>
  </svg>
  </div>
  <p class="figure-note">Audit before release. The international-student row is the binding constraint &mdash; the queue catches a smaller share of their actual non-continuers, so the model card must explain why and the threshold review must decide whether a group-aware adjustment is justified before the model is approved for advisor use.</p>
</figure>

The fairness audit measures four metrics — recall, FPR, calibration error, and flag rate — across faculty, gender, age band, international status, first-generation status, and study mode. Each metric has a different failure mode: low recall in a group means the queue under-serves them; high FPR means a group bears advisor attention they did not need; calibration drift means the score has different meaning across groups; flag rate compared to base rate detects over- or under-flagging. The diagram above shows the recall view in the pilot term; the international-student row is the binding constraint and triggers the model card review.

The deck slide title — "engagement patterns differ for legitimate reasons" — is deliberately not a defence of the gap. International students do have legitimately different LMS and campus patterns (they study on different rhythms, often live on-campus differently, may use private study off platform). That is exactly why the audit exists: to detect when those legitimate patterns translate into the model under-serving the group, and to force a documented decision (group-aware threshold, recalibration, or accepting the gap) before release rather than after.

Protected attributes are retained for audit, not shown as advisor reason codes. Audit by gender, age band, international status, first-generation status, faculty, programme, and study mode happens at every model release and is recorded in the model card alongside the metrics table.

## Deployment

This last chapter covers what happens after the model is good enough to release: the production controls that keep it good, the trade-offs the design accepts on purpose, the thirty-week roadmap that gets the platform from week 0 to a hardened pilot, and the open questions for the university whose answers shape thresholds and scope. It is the smallest chapter in design surface but the largest in operational lifetime — most of the platform's time is spent here, not in building the first model.

### Production Controls

| Control | Production rule |
| --- | --- |
| Freshness | Alert when a Data Factory/Fabric pipeline misses SLA or an Event Hubs stream falls behind. |
| Completeness | Compare expected vs received enrollment and event counts by faculty and term. |
| Identity | Track unmatched IDs and low-confidence matches; route exceptions to data stewardship. |
| Schema drift | Block breaking column, type, or semantic changes before silver promotion. |
| Score drift | Monitor score distribution, feature drift, and reason-code mix by term, faculty, programme, and study mode. |
| Governance | Register assets and lineage in Purview; version model, features, data, thresholds, reason-code logic, and approvals. |
| Security | Use pseudonymous modeling, restricted PII, encryption, private access where needed, Entra groups, Key Vault secrets, and Power BI/Fabric RLS. |
| Audit | Store advisor access, data lineage, model release approval, and threshold approval in immutable audit tables. |

The controls split cleanly into two groups. The first five (freshness, completeness, identity, schema drift, score drift) are about catching silent data degradation early — most production failures of risk models are not model failures but quiet upstream changes (a renamed LMS column, a new ERP code) that the model swallows without complaint. The last three (governance, security, audit) are the regulatory surface: when the platform is challenged, these controls produce the evidence trail that shows what was known, who approved it, and who saw what.

Two of the controls deserve a closer look because they are commonly skipped. **Score drift monitoring by group** is what catches fairness regressions between releases — the model card records the fairness gap at release, and weekly monitoring against that baseline is how a creeping gap becomes visible before it has done damage. **Reason-code mix monitoring** is the operational version: if a particular reason code suddenly explains 60 percent of red predictions in one faculty, that is almost certainly a data issue (the LMS in that faculty is exporting differently) rather than a real signal change.

### Key Decisions I Would Defend

| Decision | Recommendation | Why | Accepted cost |
| --- | --- | --- | --- |
| Batch vs streaming | Daily ingestion plus weekly scoring; Event Hubs capture for high-frequency streams. | Good enough for advisor intervention and easier to govern. | Hours of staleness in exchange for governable lineage. |
| Model complexity | Interpretable first. | Easier to defend, calibrate, explain, and approve. | A few AUC points to keep reason codes honest. |
| Global vs local model | One global model with faculty/programme/study-mode context. | More stable at launch; local models need more data. | Slightly weaker per-faculty fit; revisit once two terms of pilot data exist. |
| Raw vs aggregate exposure | Keep raw restricted, expose advisor-useful aggregates. | Preserves audit while reducing privacy risk. | Two storage tiers and a feature contract instead of one flat surface. |
| Protected attributes | Use for audit only. | Needed to detect bias; not appropriate as reason codes. | Storing demographics under audit, never as advisor reason codes. |

Each of these trade-offs has a plausible counter-argument. Real-time scoring would let the platform react to a sudden engagement collapse the day it happens; the design rejects it because advisor intervention has a multi-day cycle anyway, and the operational and governance cost of a streaming feature store is large. A single global model is weaker than per-faculty models for the larger faculties; the design accepts that for launch because the smaller faculties simply do not have enough non-continuation events to fit a stable per-faculty model, and a heterogeneous quality story across faculties is harder to defend than a single calibrated global model. The intent is to revisit local models once two pilot terms have shipped.

### Thirty-Week Rollout In Five Phases

<figure class="diagram-figure" aria-labelledby="roadmapDiagramTitle">
  <figcaption id="roadmapDiagramTitle">Thirty-week delivery plan with phase swimlanes and milestones<span class="figure-tag">Diagram 4</span></figcaption>
  <div class="figure-scroll">
  <svg viewBox="0 0 1000 360" role="img" aria-label="Five sequential phases across 30 weeks: governance, ingestion and history, features and first model, advisor pilot, and Responsible AI review and hardening. Diamond markers above the timeline show the milestone at the end of each phase.">
    <defs>
      <linearGradient id="rmNavy"  x1="0" x2="1" y1="0" y2="0"><stop offset="0%" stop-color="#1a2332"/><stop offset="100%" stop-color="#1f4f6f"/></linearGradient>
      <linearGradient id="rmGreen" x1="0" x2="1" y1="0" y2="0"><stop offset="0%" stop-color="#0d6e58"/><stop offset="100%" stop-color="#4a9078"/></linearGradient>
      <linearGradient id="rmBlue"  x1="0" x2="1" y1="0" y2="0"><stop offset="0%" stop-color="#1f4f6f"/><stop offset="100%" stop-color="#4a85a0"/></linearGradient>
      <linearGradient id="rmAmber" x1="0" x2="1" y1="0" y2="0"><stop offset="0%" stop-color="#a86317"/><stop offset="100%" stop-color="#c5944c"/></linearGradient>
      <linearGradient id="rmRed"   x1="0" x2="1" y1="0" y2="0"><stop offset="0%" stop-color="#8a3a2c"/><stop offset="100%" stop-color="#b06a55"/></linearGradient>
    </defs>
    <rect x="0" y="0" width="1000" height="360" fill="#ffffff"/>
    <text x="40" y="20" font-family="Segoe UI, Arial, sans-serif" font-size="12" font-weight="850" fill="#0d6e58" letter-spacing="1.2">DELIVERY PHASES &#8901; W0 TO W30 &#8901; ONE GLOBAL MODEL, TWO PILOT FACULTIES</text>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11.5" font-weight="700" fill="#5b6675">
      <text x="320" y="42" text-anchor="middle">W0</text>
      <text x="400" y="42" text-anchor="middle">W4</text>
      <text x="480" y="42" text-anchor="middle">W8</text>
      <text x="560" y="42" text-anchor="middle">W12</text>
      <text x="640" y="42" text-anchor="middle">W16</text>
      <text x="720" y="42" text-anchor="middle">W20</text>
      <text x="800" y="42" text-anchor="middle">W24</text>
      <text x="880" y="42" text-anchor="middle">W28</text>
      <text x="920" y="42" text-anchor="middle">W30</text>
    </g>
    <line x1="320" y1="50" x2="920" y2="50" stroke="#1a2332" stroke-width="1.4"/>
    <g stroke="#e6e8ec" stroke-width="1" stroke-dasharray="2 4">
      <line x1="320" y1="50" x2="320" y2="288"/>
      <line x1="400" y1="50" x2="400" y2="288"/>
      <line x1="480" y1="50" x2="480" y2="288"/>
      <line x1="560" y1="50" x2="560" y2="288"/>
      <line x1="640" y1="50" x2="640" y2="288"/>
      <line x1="720" y1="50" x2="720" y2="288"/>
      <line x1="800" y1="50" x2="800" y2="288"/>
      <line x1="880" y1="50" x2="880" y2="288"/>
      <line x1="920" y1="50" x2="920" y2="288"/>
    </g>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11" fill="#1a2332" font-weight="850">
      <polygon points="400,64 408,72 400,80 392,72" fill="#1a2332"/>
      <text x="388" y="76" text-anchor="end">DPIA approved</text>
      <polygon points="520,64 528,72 520,80 512,72" fill="#0d6e58"/>
      <text x="508" y="76" text-anchor="end" fill="#0d6e58">data spine GA</text>
      <polygon points="640,64 648,72 640,80 632,72" fill="#1f4f6f"/>
      <text x="628" y="76" text-anchor="end" fill="#1f4f6f">model v1</text>
      <polygon points="760,64 768,72 760,80 752,72" fill="#a86317"/>
      <text x="748" y="76" text-anchor="end" fill="#a86317">pilot live</text>
      <polygon points="920,64 928,72 920,80 912,72" fill="#8a3a2c"/>
      <text x="908" y="76" text-anchor="end" fill="#8a3a2c">RAI sign-off</text>
    </g>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="13" font-weight="850" fill="#1a2332">
      <text x="40" y="116">1. Governance &amp; foundations</text>
      <text x="40" y="156">2. Ingestion &amp; history</text>
      <text x="40" y="196">3. Features &amp; first model</text>
      <text x="40" y="236">4. Advisor pilot</text>
      <text x="40" y="276">5. RAI review &amp; hardening</text>
    </g>
    <rect x="320" y="100" width="80"  height="22" rx="4" fill="url(#rmNavy)"/>
    <text x="360" y="116" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#ffffff">foundations</text>
    <rect x="400" y="140" width="120" height="22" rx="4" fill="url(#rmGreen)"/>
    <text x="460" y="156" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#ffffff">ingest &#8901; identity</text>
    <rect x="520" y="180" width="120" height="22" rx="4" fill="url(#rmBlue)"/>
    <text x="580" y="196" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#ffffff">point-in-time features</text>
    <rect x="640" y="220" width="120" height="22" rx="4" fill="url(#rmAmber)"/>
    <text x="700" y="236" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#ffffff">two-faculty pilot</text>
    <rect x="760" y="260" width="160" height="22" rx="4" fill="url(#rmRed)"/>
    <text x="840" y="276" text-anchor="middle" font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="850" fill="#ffffff">RAI review &#8901; harden</text>
    <line x1="40" y1="304" x2="960" y2="304" stroke="#e6e8ec" stroke-width="1"/>
    <g font-family="Segoe UI, Arial, sans-serif" font-size="11" font-weight="700" fill="#5b6675">
      <rect x="40"  y="318" width="14" height="10" fill="#1a2332"/><text x="60"  y="328">phase 1</text>
      <rect x="120" y="318" width="14" height="10" fill="#0d6e58"/><text x="140" y="328">phase 2</text>
      <rect x="200" y="318" width="14" height="10" fill="#1f4f6f"/><text x="220" y="328">phase 3</text>
      <rect x="280" y="318" width="14" height="10" fill="#a86317"/><text x="300" y="328">phase 4</text>
      <rect x="360" y="318" width="14" height="10" fill="#8a3a2c"/><text x="380" y="328">phase 5</text>
      <polygon points="450,318 458,324 450,330 442,324" fill="#1a2332"/><text x="464" y="328">milestone</text>
    </g>
  </svg>
  </div>
  <p class="figure-note">Five sequential phases with diamonds at the only hard gates. Governance review, monitoring, and RAI evidence-gathering keep running once the pilot is live &mdash; Phase 5 is the catch-up window for the parallel work that always bleeds in. If Phase 2 slips by a week, Phases 3 to 5 slide by a week and the gates simply move with them.</p>
</figure>

1. Weeks 0-4: DPIA, lawful basis, Azure landing zone, residency decision, outcome definition, access model, data contracts.
2. Weeks 4-10: configure ADLS Gen2/Fabric workspaces, ingest SIS/LMS/ERP/campus, identity map, type-2 history.
3. Weeks 10-16: point-in-time Fabric feature products, labels, validation, first Azure ML model.
4. Weeks 16-22: Power BI/Fabric advisor pilot with two faculties, RLS, feedback capture, intervention outcomes.
5. Weeks 22-30: Responsible AI review, calibration, Azure Monitor alerts, Purview lineage, model registry, operational hardening.

The roadmap puts governance first deliberately. A common failure mode in projects like this is to rush ingestion, build a model, and only then discover that the lawful basis or the access model was never actually agreed — at which point the work has to unwind months later. Putting DPIA and access model in weeks 0–4 means the rest of the build happens against a fixed legal frame, not a moving one. The roadmap also places RAI review last, not first, on purpose: there is nothing meaningful to audit until the pilot has produced predictions and intervention outcomes.

Run the prototype:

```powershell
python .\student_outcome_platform_demo.py --out outputs --students 1200 --seed 42
```

### Brief Coverage And References

| Brief requirement | Where the solution addresses it |
| --- | --- |
| Fragmented systems with different formats, update frequencies, APIs, files, and events. | Mixed Data Factory/Fabric ingestion, Event Hubs, source contracts, bronze metadata, schema drift controls. |
| Inconsistent identifiers. | Canonical `identity_map` with source IDs, validity windows, and match confidence. |
| Student status changes over time. | Type-2 `student_status_history`, effective-dated finance snapshots, and point-in-time joins. |
| Train only on available information. | `event_time <= t`, `available_at <= t`, feature snapshots, and closed label windows. |
| Diverse population. | Segmented baselines, protected-attribute audit, calibration checks, and reason-code monitoring. |
| Explain and audit. | Risk band, score range, top reasons, source freshness, feature hash, model version, and `access_audit`. |

Reference basis: Microsoft Fabric overview, medallion lakehouse architecture, Azure Event Hubs Capture, Azure Data Factory to Purview lineage, Azure ML Responsible AI dashboard, and Power BI row-level security.

### Open Questions For The University

A short list of items the design intentionally leaves to discovery, because the answers shape thresholds and scope rather than the platform shape.

| Topic | Question | Why it matters |
| --- | --- | --- |
| Outcome definition | Are exchange semesters, approved leave, and programme transfers all excluded from the non-continuation label, and how is that recorded? | Mislabelled positives inflate recall and disguise model error. |
| Advisor capacity | What is the actual weekly capacity by faculty, including peak weeks (mid-term and exam season)? | Capacity sets &#964;<sub>red</sub> and &#964;<sub>amber</sub>; a guess produces an unbacked queue. |
| Source freshness SLAs | What is the worst acceptable lag for SIS status, LMS submissions, and ERP balances? | Determines whether weekly scoring catches the signal in time. |
| Privacy boundaries | Is wellbeing or counselling data in scope, and under which lawful basis? | Scope creep here turns the platform into a different DPIA. |
| Evidence for explanation | Will advisors share specific reason codes with students, and in what register? | Drives the reason-code style guide and the support-route catalogue. |

These five questions are the only items the design refuses to guess on. Everything else — service choices, table shapes, validation metrics, fairness thresholds — is decided in the document above. These five are decided with the university because the right answer depends on policy and capacity that lives outside the engineering team, and a platform that pretends otherwise is one that has to be reworked the first time a real DPIA reviewer or faculty operations lead asks the question.
