Announcement_14
We propose Guided Hybrid Policy Optimization (GHPO) — a difficulty-aware RLVR framework that improves training stability and reasoning performance in LLMs, achieving strong gains on challenging math benchmarks. Code is available here.