Separating AI Hype from Reality: Understanding Limitations and Breakthroughs

Technology

Originally posted on tenetq.com.

‍

About the author: Throughout the last decade, I've spent a lot of time with machine learning algorithms and, in the past year, have been close to the frontier of large language model innovation, which most people think of these days when they hear "AI". This spans my academic journeys such as benchmarking algorithms in their ability to explain bond yields, benchmarking innovation across the UN System, my career in management consulting of advanced technologies, and my current position as a Californian AI Startup CEO.

Motivation of the article: To most, "AI" is an abstract term for computers doing incredible things, like ChatGPT. I find it important to educate people of how "stupidly simple" these models are in order to understand what we're dealing with, what these AI models can do, and why they are not the one-hit wonder everyone expected them to be... but also how they are an incredible step-change innovation, if used appropriately.

What "AI" is not

Although AI stands for Artificial Intelligence, the only 'intelligent' thing about it is the creator. In fact, the definition of AI is the application of machine learning algorithms that are interpreted as performing tasks intelligently, by the user. The later examples of LLMs and image generators will make this point evident of how "simple" the algorithms actually are.

‍

Furthermore, AI is also not new. The viral rise of ChatGPT created such a recent buzz that a lot of people live under the impression that AI is a new element of everyday life, but even if we talk about large language models such as the ones that ChatGPT is built with, they also have been around for decades.

‍

How the "new" Generative AI models work

How Large Language Models (LLMs) work

The AI models that ChatGPT and Bard are using are word generators. In fact, they can be compared to the ones you may have encountered on your smartphone messaging apps. The only difference is that the models like GPT-4 are bigger and better: They are bigger, meaning they can store more complex relationships of words and they are better, because they have been trained on large chunks of the internet and properly finetuned.

However, the problem is that we must not overestimate their ability other than the exact purpose they were trained to do: Generate word for word, based on statistical likelihood. That's why AI models like ChatGPT struggle with hallucinating facts and sources or performing logic or computations, because they are "just" word generators. An AI model like GPT-4 will never be able to provide facts, sources, logic, and computations – at least not by itself. That's why we treat LLMs solely as a mouthpiece and have a more "intelligent" and factual backend to provide facts, sources, logic, and computations.

‍

The more information you provide, the better the model becomes, because it will generate more distinct words based on a more specific context instead of generating generally applicable or vague terms. However, the strain on the user to provide a lot of information and instructions every time is burdensome, which is a significant churn factor that companies like ours are trying to solve.

How Image Generation works

Modern image generators like DALL-E, Midjourney, and Stable Diffusion have impressed us in the past 2 years with immense strides in quality. Nevertheless, the underlying logic is similarly surprisingly simple: Start with a bunch of random pixels and perform dozen of steps with random changes until you get each element that the user described in text.

Step 1: Get training data. Other AI models are used to classify each element of the training images (image style like photograph or 3D animation, the subject, context-based image descriptions such as who the artist was or where the image was taken, other details like effects, angles, colors). This step is important, because the user will have to use the same descriptors to get a proper output.

Step 2: Train the model. The developers will introducing noise from the start image, then de-noise the image again until it's close to the original image.

Step 3: Use the model. When done properly, the model is able to skip the first step (of introducing noise on the original image) and only generate an image based on the text input.

Step 3 example image generated with "Q by TENET"

‍

The more information you provide, the better the model becomes, because it will generate more distinct elements it's trying to solve for. For example, boat in space = boat + space, but the model will easily be satisfied with only solving for 2 elements, so that the final result can often look "choppy". That's why we spent considerate effort to develop a creative image generation model that envisions a more specific image before generating it, see image above.

‍

AI: Facts vs. Fiction

Since OpenAI started as an R&D lab, it's no surprise that the GPT models have been intended as a research effort above everything else. Although it stands for "Generative Pre-training Transformer", it was intended as a "General Purpose Technology" (the pinnacle of technological disruptions such as the steam engine, electricity, and the computer). Roughly a year after the release of ChatGPT, we can confidently say that this is not yet the case, but we're getting very close to it.

It doesn't take a lot of critical thinking to understand why: Would you trust your stock portfolio or medical decisions to be performed by a word generator? For most, the answer is no, but it is indeed a good "engine" to build applications, which can leverage other algorithms to feed it with facts and logic that it does not possess over.

"It's as if aliens have landed, but we didn't really take it in because they speak good English." -Geoffrey Hinton (aka. Godfather of AI)

More worryingly, however, is the trend of humanifying AI. In a previous article, we have outlined the historical and psychological reasons why people are scared of AI and describing AI as "aliens" is neither accurate nor helpful in terms of understanding the technology. It is most likely causing fearmongering to incur regulations that benefit the tech giants as it increases barriers of entry for startups and disruptors.

If we'd have to humanize LLMs, I'd propose the following two options:

Optimistic View: Talking to someone who is answering top of mind after having read the entire internet once and is remarkably eloquent but too lazy to perform any critical or logical thinking.
Pessimistic View: A student who fell asleep during class but is now trying to answer the professor's question by coming up with something based on the last 5 words.

"The official version of AI, which was, really, Large Language Models convincing people that technology was relevant, that it could help them with homework and poetry and they could rap like a rapper [...] but the real beginning of the AI revolution started a number of years ago, using [...] large-scale algorithmic software of AI that have the advantage of being precise, [whereas] large language models have the advantage of leveraging information from all over the world but are not precise" -Alex Karp (CEO of Palantir)

Hence, having settled on the idea that "AI" is old (in fact, neural networks were invented in the 1950s), and that relevant large language models are the only new thing, the clear intuition of the leading AI Startups is that it'll take more than "just" the LLM model to provide artificial "intelligence".

The true seismic innovation that LLMs provide

Despite LLMs not being the 'one hit wonder' general purpose technology (GPT) that everyone had hoped for, they will leave a lasting dent in software development. In fact, I would go as far as to say that this type of AI is giving computer science a second leg:

Previously, developers required to learn a computer language, follow its syntax strictly and correctly in order to write an algorithm that will execute reliably, effectively, efficiently, and consistently.
Now, with LLMs, developers can use natural language, input any instruction or request, after which the LLM algorithm will try to execute the request in a reasonable way similar to "common sense", which is however unreliable, only mostly effective, computationally inefficient, and inconsistent.

The true innovation, hence, lies in being able to solve previously unsolvable problems in computer science such as using LLMs to standardize inputs or extract information. Previously, entire businesses were built around codifying and formatting academic sources, generating synonyms or finding a time table overlap, but now it's just a simple natural language request away.

As a result, LLMs can empower small businesses to provide a good enough plug for many issues until they have the capacity to increase accuracy or create a more resource-efficient algorithm - a true step change innovation for the software space.

TLDR: AI algorithms are remarkably simple at the core, so that comparing it to "alien intelligence" is false and misleading. In order to use LLMs for intelligent software development, many issues have to be bridged such as providing facts, sources, logic, and computation through other (AI) algorithms. In the meantime, the true step-change innovation is being able to use LLM as part of software development as placeholder solutions or to replace "common sense" manual efforts through LLM automation.

‍

Q by TENET is an AI startup developing integrated solutions for professionals and a TENET Ventures portfolio company.

‍