PolyRLAI - GenEdu

Overview

I built a reinforcement learning research platform for prediction market analysis on Polymarket. The system handles the full pipeline from automated market discovery using GPT-4, through data collection and PPO agent training, to backtesting and live simulation — all without risking real capital.

The Challenge

Prediction markets are noisy, non-stationary, and full of edge cases. Building a viable RL system required solving several hard problems:

Automated market discovery from thousands of active markets on Polymarket
Robust data collection with consistent candle-based time series
Proper RL environment design with realistic market mechanics
Safe evaluation without risking real capital during development
Non-stationary reward distributions that break standard RL assumptions

What I Built

1. Market Discovery Pipeline

An automated system for finding high-quality trading opportunities:

Market fetching — Pull all active markets from the Polymarket API
Rule-based filtering — Volume thresholds, liquidity checks, and timeframe constraints
GPT-4 quality verification — AI assessment of market suitability for RL training

2. Data Collection & Training

A complete data pipeline and RL training system:

1-minute candles — High-resolution price data with OHLCV formatting
Gymnasium environment — Custom environment with 182-dimensional observation space
PPO training — Proximal Policy Optimization with VecNormalize and 15 tuned hyperparameters
Observation engineering — Technical indicators, order book features, and market microstructure signals

3. Evaluation System

Rigorous testing without financial risk:

Deterministic backtesting — Historical replay with fixed random seeds for reproducibility
Live simulation — Real-time market data with simulated order execution
Trade logging — Parquet-based storage of every decision for post-hoc analysis

Technical Architecture

The system follows a 4-phase pipeline architecture:

Phase 1 — Discovery: Market fetching → rule-based filtering → GPT-4 quality scoring
Phase 2 — Collection: 1-min candle aggregation → feature engineering → dataset storage
Phase 3 — Training: Gymnasium environment → PPO agent → VecNormalize → checkpoint management
Phase 4 — Simulation: Deterministic backtest → live paper trading → parquet trade logs

Security & Quality

Safety and reproducibility are critical in financial ML research:

Simulation-only execution — No real money involved; all trading is paper-based
Deterministic evaluation — Fixed seeds and replay buffers for reproducible results
Fee modeling — 3% transaction fee built into reward function for realistic P&L
Observation normalization — VecNormalize prevents reward hacking and training instability

Outcome

All 4 pipeline phases fully implemented and operational
Complete historical replay system for backtesting RL agents
182-dimensional observation space with engineered features
Production-ready for paper trading and research experimentation

Overview

Role

Technologies

The Challenge

What I Built

1. Market Discovery Pipeline

2. Data Collection & Training

3. Evaluation System

Technical Architecture

Security & Quality

Outcome

ThoughtPartnr — AI Business Advisor →

PolyRLAI — Reinforcement Learning Trading System

Overview

Role

Technologies

The Challenge

What I Built

1. Market Discovery Pipeline

2. Data Collection & Training

3. Evaluation System

Technical Architecture

Security & Quality

Outcome

ThoughtPartnr — AI Business Advisor →