RLAIF (Reinforcement Learning from AI Feedback)

Variant of RLHF where feedback to train the model comes from another AI system instead of humans, scaling the alignment process.

Advanced rlhf feedback_ia alineacion

Full definition

Variant of RLHF where feedback to train the model comes from another AI system instead of humans, scaling the alignment process.

Using GPT-4 to evaluate and score the responses of a smaller model during its training.