RLHF

Reinforcement Learning with Human Feedback

Aug 01, 2023

A base-model itself is “just” a text-completion model that can generate coherent and creative text, but doesn’t quite know how to follow instructions… as a good assistant (like ChatGPT) would.

With some assistance from human feedback, we can steer the model towards outputs that is directly following the instruction.

RLHF is a step in the fine-tuning stage where human-feedback on the base model’s text-completion outputs are incorporated back into model so it can learn to only generate the high quality text outputs.

🧑‍🎓LLMentary School

Discussion about this post