Sure, thanks for your interest. It’s an incomplete picture, but we can think of LLMs as an abstraction of all the meaningful connections within a dataset to a higher dimensional space - one that can be explored. That alone is an insane accomplishment that is changing some of the pillars of data analysis and knowledge work. But that’s just the contribution of the “Attention is All You Need” paper. Many implementations of modern generative AI combine LLM inference in agentic networks, with GANs, and with rules-based processing. Extracting connections is just one part of one part of a modern AI implementation.
The emergent properties of GPT4 are enough to point toward this exponential curve continuing. Theory of mind (and therefore deception) as well as relational spatial awareness (usually illustrated with stacking problems) developed solely from increasing the parameter count describing the neural network. These were unexpected capabilities. As a result, there is an almost literal arms race on the hardware side to see what other emergent properties exist at higher model sizes. With some poetic license, we’re rending function from form so quickly and effectively that it’s seen by some as freeing and others as a sacrilege.
Some of the most interesting work on why these capabilities emerge and how we might gain some insight (and control) from exploring the mechanisms is being done by Anthropic and by users at Hugging Face. They discovered that when specific neurons in Claude’s net are stimulated, everything it responds with will in some way become about the Golden Gate Bridge, for instance. This sort of probing is perhaps a better route to progress than blindly chasing more size (despite its recent success). But only time will tell. Certainly, Google and MS have had a lot of unforced errors fumbling over themselves to stay in what they think is the race.
Would you give your perspective anyway, as I would be quite interested, although I’m not the one you talked to?
Sure, thanks for your interest. It’s an incomplete picture, but we can think of LLMs as an abstraction of all the meaningful connections within a dataset to a higher dimensional space - one that can be explored. That alone is an insane accomplishment that is changing some of the pillars of data analysis and knowledge work. But that’s just the contribution of the “Attention is All You Need” paper. Many implementations of modern generative AI combine LLM inference in agentic networks, with GANs, and with rules-based processing. Extracting connections is just one part of one part of a modern AI implementation.
The emergent properties of GPT4 are enough to point toward this exponential curve continuing. Theory of mind (and therefore deception) as well as relational spatial awareness (usually illustrated with stacking problems) developed solely from increasing the parameter count describing the neural network. These were unexpected capabilities. As a result, there is an almost literal arms race on the hardware side to see what other emergent properties exist at higher model sizes. With some poetic license, we’re rending function from form so quickly and effectively that it’s seen by some as freeing and others as a sacrilege.
Some of the most interesting work on why these capabilities emerge and how we might gain some insight (and control) from exploring the mechanisms is being done by Anthropic and by users at Hugging Face. They discovered that when specific neurons in Claude’s net are stimulated, everything it responds with will in some way become about the Golden Gate Bridge, for instance. This sort of probing is perhaps a better route to progress than blindly chasing more size (despite its recent success). But only time will tell. Certainly, Google and MS have had a lot of unforced errors fumbling over themselves to stay in what they think is the race.
Thanks so much for taking the time to explain this. I was just going to give them a link.
Thank you very much for those insights!!
Drivel