Overview
I built a reinforcement learning research platform for prediction market analysis on Polymarket. The system handles the full pipeline from automated market discovery using GPT-4, through data collection and PPO agent training, to backtesting and live simulation — all without risking real capital.
The Challenge
Prediction markets are noisy, non-stationary, and full of edge cases. Building a viable RL system required solving several hard problems:
- Automated market discovery from thousands of active markets on Polymarket
- Robust data collection with consistent candle-based time series
- Proper RL environment design with realistic market mechanics
- Safe evaluation without risking real capital during development
- Non-stationary reward distributions that break standard RL assumptions
What I Built
1. Market Discovery Pipeline
An automated system for finding high-quality trading opportunities:
- Market fetching — Pull all active markets from the Polymarket API
- Rule-based filtering — Volume thresholds, liquidity checks, and timeframe constraints
- GPT-4 quality verification — AI assessment of market suitability for RL training
2. Data Collection & Training
A complete data pipeline and RL training system:
- 1-minute candles — High-resolution price data with OHLCV formatting
- Gymnasium environment — Custom environment with 182-dimensional observation space
- PPO training — Proximal Policy Optimization with VecNormalize and 15 tuned hyperparameters
- Observation engineering — Technical indicators, order book features, and market microstructure signals
3. Evaluation System
Rigorous testing without financial risk:
- Deterministic backtesting — Historical replay with fixed random seeds for reproducibility
- Live simulation — Real-time market data with simulated order execution
- Trade logging — Parquet-based storage of every decision for post-hoc analysis
Technical Architecture
The system follows a 4-phase pipeline architecture:
- Phase 1 — Discovery: Market fetching → rule-based filtering → GPT-4 quality scoring
- Phase 2 — Collection: 1-min candle aggregation → feature engineering → dataset storage
- Phase 3 — Training: Gymnasium environment → PPO agent → VecNormalize → checkpoint management
- Phase 4 — Simulation: Deterministic backtest → live paper trading → parquet trade logs
Security & Quality
Safety and reproducibility are critical in financial ML research:
- Simulation-only execution — No real money involved; all trading is paper-based
- Deterministic evaluation — Fixed seeds and replay buffers for reproducible results
- Fee modeling — 3% transaction fee built into reward function for realistic P&L
- Observation normalization — VecNormalize prevents reward hacking and training instability
Outcome
- All 4 pipeline phases fully implemented and operational
- Complete historical replay system for backtesting RL agents
- 182-dimensional observation space with engineered features
- Production-ready for paper trading and research experimentation