ML Research 2024-2025 Machine Learning & AI

Overview

I built a reinforcement learning research platform for prediction market analysis on Polymarket. The system handles the full pipeline from automated market discovery using GPT-4, through data collection and PPO agent training, to backtesting and live simulation — all without risking real capital.

Role

ML Engineering
Research
System Architecture

Technologies

Python stable-baselines3 PPO Gymnasium pandas OpenAI GPT-4 Polymarket API

The Challenge

Prediction markets are noisy, non-stationary, and full of edge cases. Building a viable RL system required solving several hard problems:

  • Automated market discovery from thousands of active markets on Polymarket
  • Robust data collection with consistent candle-based time series
  • Proper RL environment design with realistic market mechanics
  • Safe evaluation without risking real capital during development
  • Non-stationary reward distributions that break standard RL assumptions

What I Built

1. Market Discovery Pipeline

An automated system for finding high-quality trading opportunities:

  • Market fetching — Pull all active markets from the Polymarket API
  • Rule-based filtering — Volume thresholds, liquidity checks, and timeframe constraints
  • GPT-4 quality verification — AI assessment of market suitability for RL training

2. Data Collection & Training

A complete data pipeline and RL training system:

  • 1-minute candles — High-resolution price data with OHLCV formatting
  • Gymnasium environment — Custom environment with 182-dimensional observation space
  • PPO training — Proximal Policy Optimization with VecNormalize and 15 tuned hyperparameters
  • Observation engineering — Technical indicators, order book features, and market microstructure signals

3. Evaluation System

Rigorous testing without financial risk:

  • Deterministic backtesting — Historical replay with fixed random seeds for reproducibility
  • Live simulation — Real-time market data with simulated order execution
  • Trade logging — Parquet-based storage of every decision for post-hoc analysis

Technical Architecture

The system follows a 4-phase pipeline architecture:

  • Phase 1 — Discovery: Market fetching → rule-based filtering → GPT-4 quality scoring
  • Phase 2 — Collection: 1-min candle aggregation → feature engineering → dataset storage
  • Phase 3 — Training: Gymnasium environment → PPO agent → VecNormalize → checkpoint management
  • Phase 4 — Simulation: Deterministic backtest → live paper trading → parquet trade logs

Security & Quality

Safety and reproducibility are critical in financial ML research:

  • Simulation-only execution — No real money involved; all trading is paper-based
  • Deterministic evaluation — Fixed seeds and replay buffers for reproducible results
  • Fee modeling — 3% transaction fee built into reward function for realistic P&L
  • Observation normalization — VecNormalize prevents reward hacking and training instability

Outcome

  • All 4 pipeline phases fully implemented and operational
  • Complete historical replay system for backtesting RL agents
  • 182-dimensional observation space with engineered features
  • Production-ready for paper trading and research experimentation