Student Projects

Where THEORY meets market data

These are real projects built by students in the programme — not exercises, not demos. Each one started with a dataset, a question, and about twelve hours of confusion before something clicked.

Six projects from the current cohort

01 LSTM · Equities

TSX sector rotation signals

What it does: Trained an LSTM on 8 years of TSX sector ETF data to flag rotation windows between defensive and cyclical holdings. The model runs on weekly rebalancing logic, not daily noise.

02 NLP · Sentiment

Earnings call transcript scoring

What it does: Processes S&P 500 earnings call transcripts using a fine-tuned BERT variant and correlates language tone shifts with next-session price movement. Built on publicly available SEC filings.

03 Random Forest · Risk

Volatility regime classifier

What it does: Classifies market conditions into four volatility regimes using a random forest trained on VIX derivatives, put/call ratios, and historical realized vol windows. Outputs a regime label, not a price target.

04 GRU · Time-Series

Intraday momentum decay model

What it does: A GRU that estimates how long a morning gap or momentum push typically sustains itself before mean-reversion dominates. Validated on 15-minute bar data across three years.

05 XGBoost · Features

Alternative data feature pipeline

What it does: Combines satellite foot traffic estimates, job posting counts, and shipping data into a structured feature set fed to an XGBoost regression model for retail sector earnings estimation.

06 Transformer · Multi-asset

Cross-asset correlation shift detector

What it does: Uses a lightweight transformer architecture to detect when historical correlation relationships between asset classes break down — a leading signal for portfolio stress, not alpha generation.

Student feedback

What students say after finishing a project

Honest accounts. These are notes shared by participants after presenting their final work — they reflect what the process actually felt like, not what it looked like from the outside.

Completion takes longer than expected. Most students underestimate the data cleaning phase by a significant margin — that part alone tends to consume a third of the total build time.

Soraya Mäkinen Volatility Classifier Project

I spent the first four weeks convinced my feature selection was wrong. It wasn't — the model just needed a proper validation window. Getting that part right changed everything about how I think about backtesting.

Oliwia Drzazga Earnings Call NLP Project

The transcripts were messier than any tutorial data I'd worked with before. Cleaning them taught me more about real NLP pipelines than the architecture decisions did — and that surprised me.

Taavi Kaljurand Cross-Asset Correlation Project

My first transformer attempt overfit badly on 2008 data. Rebuilding it with rolling windows instead of fixed splits was a painful lesson, but the final model actually generalises to out-of-sample periods.

Numbers from the 2024 programme cycle

Aggregate figures across all student projects submitted for final review — reported as-is, without rounding or selective framing.

38 Projects completed

14 Distinct model architectures used

6 Projects moved to live paper trading

Inside a working ML pipeline built by a student

This is what a real project environment looks like — messy, annotated, and built iteratively over the course of about nine weeks.

Student machine learning pipeline setup with data visualisation and code

Data Source

Yahoo Finance API combined with FRED macroeconomic series — fully reproducible, no paid data feeds required.

Stack Used

Python, pandas, scikit-learn, PyTorch — standard tools, no proprietary libraries. Everything runs locally.

Validation Approach

Walk-forward cross-validation with a minimum 90-day out-of-sample holdout period — no look-ahead leakage.