Mutual-Taught for Co-adapting Policy and Reward Models

Published in ACL, 2025

Recommended citation: Tianyuan Shi, Canbin Huang, Fanqi Wan, Longguang Zhong, Ziyi Yang, Weizhou Shen, Xiaojun Quan, Ming Yan. (2025). "Mutual-Taught for Co-adapting Policy and Reward Models." ACL 2025.
Download Paper