Thread
๐Ÿ”ญ A ๐Ÿงต on @OpenAI LLM "Alignment" (e.g. #ChatGPT)

Q: How does this differ from publicly available "Instruction Tuning" (IT)?

A: Proprietary Alignment is actually 3 separate components:

1โƒฃ Instruction tuning
2โƒฃ โž• Open-ended generation/creative prompts
3โƒฃ โž• Human feedback

1/
Component 1โƒฃ:

Instruction Tuning, in its simplest form, teaches the model to follow/answer instructions, instead of generating plausible continuations.

E.g. see @GoogleAI's Flan Collection: arxiv.org/abs/2301.13688

2/
Instruction Tuning public collections are made of 95%+:
โžก๏ธ academic,
โžก๏ธ short-answer,
โžก๏ธ traditional,
NLP tasks. This is a limitation.

3/
Component 2โƒฃ:

The InstructGPT blog confirms @OpenAI uses the public-sourced Playground inputs for training.

๐ŸŒŸ These are inevitably MUCH more diverse, challenging, and creative ๐ŸŽจ than traditional NLP ๐ŸŒŸ

openai.com/blog/instruction-following/ by @ryan_t_lowe, @janleike

4/
Because...

1. Traditional NLP is skewed to tasks w/ *automatic* eval metrics (mostly short text answers)
2. Human users try to push GPT-3s limits

โžก๏ธ Creative/long generation inputs (e.g. essay/poem writing, explanation) teach models new skills

5/
Component 3โƒฃ:

@OpenAI guides models (indirectly) w/ human preferences over possible generations, using Reinforcement Learning from Human Feedback (RLHF).

GPT-3++'s results are often often credited primarily to this component -- but is that fair/true?

6/
Firstly, researchers are now asking: do we need RL or just the human feedback?










7/
In fact, @tianjun_zhang shows strong results with a supervised version of RLHF:




8/
๐ŸŒŸ Take-aways ๐ŸŒŸ

So how much does any component matter?

Is human feedback needed at all if we had more public diverse/open-ended/creative input-output pairs?

E.g. Bias/toxicity goals are improved significantly even w/o human values (just IT):




9/
โžก๏ธ Answering these questions will prioritize future work for the field.

10/
๐ŸŒŸ Take-aways ๐ŸŒŸ

So much emphasis has been put on the results of Alignment and RLHF, but effectively training the next generation of models will require measuring each component of Alignment.

11/
๐ŸŒŸ Disclaimer ๐ŸŒŸ

I don't work at OpenAI, and ChatGPT remains unpublished/documented, so this is based mostly off of InstructGPT's paper (arxiv.org/abs/2203.02155, @longouyang).

Speculatively, I would bet ChatGPT benefits from new techniques, e.g. interactive/dialog tuning.

12/
There are also new Human Feedback datasets now publicly available:

โžก๏ธ Anthropic's HH-RLHF: huggingface.co/datasets/Anthropic/hh-rlhf

โžก๏ธ@ethayarajh cleverly mined 385k Reddit comments (Stanford Human Preferences:

)

13/
There are also new exciting works circumventing Human Feedback and Creative prompt collection:

ConstitutionalAI: www.anthropic.com/constitutional.pdf (@AnthropicAI)

Self-Instruct: arxiv.org/abs/2212.10560 (@yizhongwyz)

Unnatural Instructions: arxiv.org/abs/2212.09689 (@OHonovich)

14/
Why is this important?

Scaling these new works offer an opportunity for academia and public research to catch up to proprietary models.

15/
The challenge?

Commercial and public LLM research's relationship is asymmetric, or one-way: corporations can benefit from academic findings, but less commonly in reverse.

16/
Thank you for reading this far. If you have feedback or want to chat, shoot me a DM!

17/17
Mentions
See All