The pursuit of Artificial General Intelligence (AGI) requires learning frameworks that not only optimize task performance but also align with complex human values. Reinforcement Learning with Human Feedback (RLHF) has emerged as a promising approach to address this challenge; however, conventional RLHF pipelines face scalability issues, reward-model brittleness, and safety concerns. In this study, we propose a PPO-based RLHF framework enhanced with hybrid human–AI oversight and predictive reward evaluation metrics. The framework integrates human annotations with AI-generated critiques, improving data efficiency and robustness against reward hacking. Experimental evaluations on language alignment and control benchmarks demonstrate that the proposed approach achieves a preference win-rate of 78% (vs. 65% in standard RLHF and 54% in supervised fine-tuning), improves task accuracy to 83% (a 12% increase over Sparrow), and reduces safety violations by 31% compared to baseline RLHF. Furthermore, the hybrid oversight strategy enhanced sample efficiency by 1.5×, reducing overall training episodes and annotation costs. These results confirm that the proposed method significantly improves alignment, efficiency, and safety, positioning RLHF with hybrid oversight and predictive evaluation as a practical substrate for advancing safe and scalable AGI systems.