【从零开始的GRPO实现】'GRPO.py:Implementation

爱生活爱珂珂 2025-02-16 09:18:40

【从零开始的GRPO实现】

'GRPO.py: Implementation of GRPO (Gradient-Penalty Reinforcement Optimization) for language model fine-tuning.'

GitHub: github.com/aburkov/theLMbook/blob/main/GRPO.py

0 阅读:0
爱生活爱珂珂

爱生活爱珂珂

感谢大家的关注