Group Sequence Policy Optimization vs Group Relative Policy Optimization

Two generational advancements in a duel of dynamic decision dynamics

Aug 11, 2025

∙ Paid

Remember when earlier this year, DeepSeek’s release of DeepSeekMath and Group Relative Policy Optimization (GRPO) made training a state-of-the-art model dramatically cheaper? The follow-up release of DeepSeek-R1 amplified the effect, and the result was a sudden sell-off of AI stocks. Then only a few months later, Alibaba’s Qwen team introduced Group Seq…

Continue reading this post for free, courtesy of Jan Daniel Semrau (MFin, CAIO).

Or purchase a paid subscription.

Encyclopedia Autonomica

Group Sequence Policy Optimization vs Group Relative Policy Optimization

Two generational advancements in a duel of dynamic decision dynamics

Continue reading this post for free, courtesy of Jan Daniel Semrau (MFin, CAIO).