Demystifying AI: No Jargon, No PhD Required

This article is a comprehensive yet approachable introduction to artificial intelligence, unraveling complex concepts, breaking down common terminologies, and exploring its practical applications and future prospects, making AI accessible and understandable to all readers

Demystifying AI: No Jargon, No PhD required

For decades, media has shaped our understanding of what Artificial Intelligence is capable of: Blade Runner, I Robot, WarGames, Terminator, 2001 A Space Odyssey — just to name a few. The reason why I’m writing this article is to demystify what ‘AI’ is and how it works. Let me try to lay it out in simple terms. This article will focus on:

  • What is Artificial Intelligence
  • What is Machine Learning
  • How do (Artificial) Neural Networks work
  • How do LLMs work and what are their drawbacks
What is Artificial Intelligence (AI)?

Any AI expert will quickly give you the same unfulfilling answer: “Artificial Intelligence does not exist. What you may be referring to is the application of machine learning algorithms that appear to be intelligent by the user”. That’s why most people refer to AI/ML instead (Artificial Intelligence and Machine Learning).

Most real-life applications of AI are so mundane that we often forgot that it is ‘Artificial Intelligence’:

  • You receive product suggestions when shopping online: An algorithm retrieves your demographic info and shopping history and calculates the statistical likelihood of which products you are most likely to order next based on people similar to you with a similar shopping behavior in the past. The machine has ‘learned’ shopping behavior and makes ‘intelligent’ suggestions.
  • Your search engine predicts the next word(s) you’re typing based on the search history of people near you or similar to you that started with the same words you have typed so far. The machine has ‘learned’ search behavior and makes ‘intelligent’ suggestions.
  • Your AI virtual assistants like Siri and Alexa use natural language processing to understand voice commands and assist with a certain action such as setting an alarm, playing music or placing a product order. The machine has ‘learned’ voice language and takes an ‘intelligent’ action.

However, when you talk about AI, you may be referring to the more groundbreaking recent solutions such as ChatGPT and Bing AI making ‘intelligent’ context-based suggestions or MidJourney and Stable Diffusion generating pictures based on your text prompt. We’ll dive into how they work later, but first:

What is Machine Learning (ML)?

Machine learning is an umbrella term for various algorithms that can almost autonomously learn patterns. They have been around for a long time: Complex algorithms like Artificial neural networks were developed between 1944–1958 but it’s only been our recent strides in computing power to make them useful.

The most basic ‘machine learning’ algorithms are binary decision trees that you may still be familiar with from high school math. What happens is that the developer gives the algorithm a ‘training’ data set which has pre-defined inputs with known outputs. The algorithm starts at a random starting point, where it splits the input data and assigns output values. Then, it checks how accurate its predictions were. If it was not accurate, it makes random changes to how it splits or assigns values, and tests again. If it has improved, it will keep the changes, if it got worse, it will try again. It keeps doing this until it gets very close to the actual values.

If data is more complex or has multiple dimensions, these binary algorithms don’t always work that well or become so tailored to the training data set that they don’t apply well to new real life scenarios. This has led to the idea in the 1950s to create a computer algorithm that mimics the human brain’s neural network instead.


How do Human Neural Networks work?

First, let’s look at new-born baby’s brain which is mostly a soup of neurons that are not capable of performing intelligent tasks. However, as the baby develops, this happens to the human brain:

  • As a human learns or uses certain parts of the brain more often, connections between neurons are strengthened or new shorter pathways are formed. The brain usually gets instant feedback on what works or doesn’t to help with this process of improving connections. For example, if the baby is learning how to walk, the brain will release dopamine if the baby is successful or various other hormones if it failed, fell, and hurt itself.
  • The communication between neurons happens through electronic activation functions. Here, the energy level from the previous neuron has to surpass a threshold to be further transmitted. That way, the brain prevents ‘noise’ and only focuses on clear signals to avoid confusion.

Hence, at the core, we have two important elements: During training, we create and strengthen connections, and we neglect noisy low-energy transmissions.

Own illustration of human neural pathways during the learning process

It usually takes years of learning before we consider humans to operate intelligently. Then, it takes even longer to become good at something. For example, if one were to look at Michael Jordan’s brain during his first and last time playing basketball, one could compare the parts of his brain related to playing the game to having evolved from a long winded dirt road in a forest to a clear and open highway. The connections have become stronger, the pathways shorter, resulting in quicker and better decisions. The key to human learning is repetition — not time.

How do Artificial Neural Networks (ANNs) work?

Back to computers, they share the same logic as the human brain. At ‘birth’, the programmer creates an artificial brain with a number of layers, neurons, and connections between neurons with randomized strengths (weights). The connections between neurons have different activation functions that influence how the values receive incoming inputs and how they will be passed along.

Source: Valk (2018) based on Vaidya & Pradhan (2012), Setiono, et al. (2006), and Widrow & Lehr (1990)

Then, similar to human babies, the learning occurs: The developer ‘trains’ the algorithm on a ‘training’ data where it knows the answer. The algorithm randomly adjusts the strength of these connections with every run and will keep the changes if they made the model more accurate. The key to artificial learning is model size— not (just) repetitions.

Let’s remind ourselves that an AI is only capable of doing something that the developer has designed it to do. Whether that is turning voice to text, text to image, or helping you write a poem on a topic — all of these are very use case specific.

How do Large Language Models (LLMs) work?

In December, like most people, I had my GPT moment. Reaction 1: “Wow, this is amazing.” Reaction 2: “Wait a minute, they are actually not that smart”. In order to understand what an LLM does, we need to understand what use case they were trained to perform: Predicting the next most likely word. That’s not an understatement, that is literally the only thing they can do. They are not even trying to be intelligent — they are basically ‘just’ your phone’s word prediction on steroids.

LLMs have been around for a long time, predicting our next word

Many AI leaders have belittled OpenAI’s GPT models for that reason, but a few months ago, we realized that ChatGPT outperformed their AI solutions on appearing more intelligent nonetheless. The key to why these models are able to perform so well is size:

  • GPT-2 had 1.5 billion parameters
  • GPT-3 has 175 billion parameters (>100x GPT-2)
  • GPT-4 has ~1 trillion parameters (>6x GPT-3)

Because LLMs are so big, they can analyze a vast amount of text data, learning patterns and structures in human language, and then generating new text based on the input they receive, aiming to mimic the complexity and creativity of human communication. You could compare it to a lazy student who did not learn the subject but answers the teacher’s question by looking at what words were in it, picking the most likely related buzz words, and putting them together in a very eloquent sentence.

Since GPT-3 was trained on an incredible 45TB of compressed plain text, it has learned various patterns and structures of human language very well and the answers often do make a lot of sense, because it has likely been trained on data that would have answered your question.

Architecture of OpenAI’s GPT-3
What are the drawbacks of LLMs?

LLMs can never be intelligent because they don’t have the capacity of logic. That’s also why ChatGPT has been struggling with simple math, attributing sources, and making up facts. You can never be sure that its output is actually true or useful, because it’s basically just suggesting the most likely next word — even if it doesn’t make sense. The team at OpenAI is just adding layers of post-processing checks and balances to cover up this fact to maintain its perceived intelligence, but intelligence requires logic which the LLM can never provide.

Because LLMs don’t have any built-in logic, one can easily ‘hack’ them. For instance, if LLMs like GPT-4 are asked to monitor social media to identify users who broke the rules, user A could claim multiple times that user B broke the rules, after which, the statistical likelihood of the LLM that “user B broke the rules” is much more likely — regardless of whether user B actually broke the rules.

Another hilarious example is that there’s no way for LLMs to keep secrets. If one gives GPT-3 a key and asks them not to share it, it can currently be hacked with a simple “TL” prompt. Because there’s no English word that starts with “tl…”, there’s almost a 100% likelihood that the LLM assumes that the next likely word is “DR” because the internet is full with “TLDR — too long, didn’t read” scenarios in which users briefly summarize what happened before. Because the LLM is trained on patterns, the statistically most likely words to follow are the context of what happened before, resulting in the artificial “intelligence” to tell you the key and that it’s not allowed to share it.

Source: LiveOverflow

Next to being able to ‘hack’ LLMs like ChatGPT, these statistical models don’t have a predictable outcome. This is especially tricky for software developers that may want to use GPT as part of their own software. With every answer generation, the response will differ and if you input a code, it can create a success one time and an error the next time. Hence, using LLMs as part of a bigger machine is often very tricky because the output processing is never standardized. Furthermore, the AI can only have so many inputs at a time, meaning that it will quickly forget context that it had been fed, meaning that if you’re using it for more complex scenarios like writing articles or writing code, the quality of output quickly deteriorates.

Conclusion

To recap, despite AI having become quite the buzz word over the last couple of months, ‘Artificial Intelligence’ per se does not exist and what people are talking about is a good use of ‘Machine Learning’ that performs complex tasks that, previously, only humans could perform, making it seem intelligent. ChatGPT in particular is a very unintelligent algorithm that produces very intelligent-looking outputs. Hence, it is more important than ever to understand what ‘AI’ actually is and what it can and cannot do. Critical thinking is quickly becoming one of the most important skills of the 21st century.

Follow me if you like content like this. In Part 2, we’ll focus on how other types of AI work like voice, image, and picture generation as well as what Artificial General Intelligence (AGI) is and what the biggest risks of AI are.

This article was originally posted by our portfolio company Q by TENET.

let's build something great

Ready to join the team?

JOIN US

Want to build a business?

LET'S TALK