Hot take on GPT4
3 min readMar 15, 2023
GPT-4 was introduced last night. Features of interest:
- Structuring the input into two: instruction and the actual text. Defining the statement, giving it a context / role to assume (“You are TaxGPT, an assistant that helps calculate taxes”, etc) is now done separately from the text to be processed. (Note: this was available since GPT-3.5)
- Improved streamlining ability. Major logic hiccups may be resolved now. It seems to maintain context quite well between successive messages and is fluent in understanding a logical thread. For example, even if the precursors were also able to generate code, in the demo we see that it also understands and resolves possible errors. Using the raw error description and direct copy-paste documentation, it can understand and explain what needs to be done.
- Larger context (up to 32k tokens = 24000 words at once). That differs from how it is able to “remember” between successive messages. This type of memory cannot be used to split a long text into 3 consecutive messages, for example, because the last part will have a greater weight compared to the first, being more recent. So the solution to process and generate more text is a bigger context.
- Understanding unstructured text. Although we have a lot of text information at our disposal, it is rarely structured in a neat form. A simple document with paragraphs and dashes is a nightmare to parse with your standard library (you usually have to manually account for any inconsistency). Here I see a big advantage of GPT models (not just 4, although this is now the best performing), in providing this bridge.
- The ability to correctly perform calculations. This was one of the great shortcomings of the predecessors. In the demo we see GPT-4 calculating taxes — additions, subtractions, percentages (without calling an external calculation program — they intentionally mention this). Here I expect to see future hidden problems: we have on one hand an audience that does not have the patience to understand the principles of the limitations and on the other a black-box model that learned calculations probably from millions of synthetically generated examples (Own assumption, but for this case I find it very easy to create training data). How will one distinguish between a calculation that it does correctly and one that it does not?
- Ability to understand images. I left this at the end because it’s just a preview. A functionality that I had already seen tried in various libraries that simply pasted the interfaces of a language model to one of the text-to-image models (and vice versa): Dalle, Diffusion. But an all-in-one integration, without being constrained by the size and shape of an interface, can bring much more coherent results. And most importantly, understanding “arbitrary combinations of text and images”.
Let’s face it, it’s not a moment of singularity. We do not have human-level artificial intelligence. In the next period, “evidence” will flow about the sentience of GPT-4. And there will be many for whom, once they have a hammer, everything becomes a nail. But there is real value that I think we can discover, as long as we don’t get lost in either mirages or phobia of the new.
The demo: