It obviously matters, because it has implications to how well the models can generalize to never-before-seen inputs and tasks. Serious exacerbation of automation bias can occur if we ascribe reasoning to what is just a minor perturbation of training

Thread

It obviously matters, because it has implications to how well the models can generalize to never-before-seen inputs and tasks. Serious exacerbation of automation bias can occur if we ascribe reasoning to what is just a minor perturbation of training data

An example is the "sparks of AGI" whitepaper, which uses a minor syntactic perturbation of one of the most commonly represented proofs on the internet as part of its conclusion that the model is showing early signs of "general intelligence"

With RLHF there will be fewer and fewer never-before-seen inputs and tasks, but you are still looking at a small finite number in an infinite space of possibilities

With transformer models there is likely some interesting generalization happening, as well as learning some relations---but it's really not nearly enough for any kind of novel reasoning task or benchmark without some kind of oracle

I suspect there are some flat-out architectural limitations that will need to be surmounted before models can better generalize reliably in logical domains where perfect generalization is possible, or to better perform rule learning tasks if any kind

Though I think it's also nice to just accept that these models are still limited, but are useful for perturbations and modest generalizations of syntax, which can emulate reasoning with some strategic handholding. And focus on how to do the handholding!

Mentions

See All

Peter Wang @pwang · Apr 6, 2023

Post
From Twitter

Fantastic thread. We need to get better at handholding, so we know when *we’ve* let go and the model is pedaling on its own. (And to be clear: our current ways of handholding may actually be training wheels, and not handholding at all.)

Thread by Talia Ringer

Thread

Mentions