Thread
It obviously matters, because it has implications to how well the models can generalize to never-before-seen inputs and tasks. Serious exacerbation of automation bias can occur if we ascribe reasoning to what is just a minor perturbation of training data

An example is the "sparks of AGI" whitepaper, which uses a minor syntactic perturbation of one of the most commonly represented proofs on the internet as part of its conclusion that the model is showing early signs of "general intelligence"
With RLHF there will be fewer and fewer never-before-seen inputs and tasks, but you are still looking at a small finite number in an infinite space of possibilities
With transformer models there is likely some interesting generalization happening, as well as learning some relations---but it's really not nearly enough for any kind of novel reasoning task or benchmark without some kind of oracle
I suspect there are some flat-out architectural limitations that will need to be surmounted before models can better generalize reliably in logical domains where perfect generalization is possible, or to better perform rule learning tasks if any kind
Though I think it's also nice to just accept that these models are still limited, but are useful for perturbations and modest generalizations of syntax, which can emulate reasoning with some strategic handholding. And focus on how to do the handholding!
Mentions
See All