SciRepID - Scientific Reputation Index of Indonesian Journals

+62 813-8532-9115 info@scirepid.com

Atul Sharma

Belum punya Author-ID ?

Judul	Sitasi	Tahun
PPO-based Reinforcement Learning with Human Feedback with Hybrid Oversight and Predictive Reward Evaluation for AGI (Atul Sharma) DOI : 10.62411/faith.3048-3719-276 - Volume: 2, Issue: 3, Sitasi : 21 24-Oct-2025 \| Abstrak \| PDF File \| Resource \| Last.29-Jan-2026 Abstrak: The pursuit of Artificial General Intelligence (AGI) requires learning frameworks that not only optimize task performance but also align with complex human values. Reinforcement Learning with Human Feedback (RLHF) has emerged as a promising approach to address this challenge; however, conventional RLHF pipelines face scalability issues, reward-model brittleness, and safety concerns. In this study, we propose a PPO-based RLHF framework enhanced with hybrid human–AI oversight and predictive reward evaluation metrics. The framework integrates human annotations with AI-generated critiques, improving data efficiency and robustness against reward hacking. Experimental evaluations on language alignment and control benchmarks demonstrate that the proposed approach achieves a preference win-rate of 78% (vs. 65% in standard RLHF and 54% in supervised fine-tuning), improves task accuracy to 83% (a 12% increase over Sparrow), and reduces safety violations by 31% compared to baseline RLHF. Furthermore, the hybrid oversight strategy enhanced sample efficiency by 1.5×, reducing overall training episodes and annotation costs. These results confirm that the proposed method significantly improves alignment, efficiency, and safety, positioning RLHF with hybrid oversight and predictive evaluation as a practical substrate for advancing safe and scalable AGI systems.	21	2025

Artikel Per 5.Tahun

Sitasi Per Tahun

Co Authors

SciRepID

SciRepID - Scientific Reputation Index of Indonesian Journals.

Tentang

Designed to bridge the needs of academics, journal managers, educational institutions, and indexing agencies in evaluating journals based on scientifically accountable standards at both national and international levels, with data sourced from the Crossref database.

Copyright © SciRepID - Scientific Reputation Index of Indonesian Journals .