abril 2, 2024

Simple introduction to AI

It’s funny how what are probably the two most pronounced words in the world today, Artificial Intelligence, hide so much meaning behind them yet tell so little about it.

You see, it’s an extremely misleading term. Thus, still to this day, it is really hard to find an article that speaks to the true nature of this technology in a way that non-experts can understand.

Today, I am providing you a simple introduction to AI by diving deep into the basics of Artificial Intelligence (AI from now on), and, by the end of the article, you will have a clear knowledge of what AI is. But, more importantly, you will also understand what AI is not, a matter that has to be dealt with as promptly as possible.

We will cover the following:

An understandable definition and the principles that underpin every single AI model no matter the use case
Types of AIs in existence
The different subcategories inside each
And a generally good rule of thumb to when to apply what

So without further ado, let’s dive in.

So, what is AI?

You will find many definitions, but to me, the most straightforward and clear definition of AI is the capacity of machines to perform actions that, were they to be performed by humans, would be considered as ‘intelligent’, whatever that means (hence the reason I do not like the definition of ‘AI’ in the first place).

AI can be categorized into three main areas:

Machine Learning (ML)
Rule-based, sometimes referred to as ’symbolic AI’
Neurosymbolic AI

This takes us to the first key intuition I want to you take from this article: AI is not equal to Machine Learning. Period. The reason so many fall into this mistake is because Machine Learning algorithms have completely taken the world by storm, with examples ranging from ChatGPT to TikTok’s recommendation algorithm.

But what is Machine Learning then?

Well, before answering, it’s best if we first focus on answering what rule-based AI is, as that will give us clear intuition about the former.

The least intelligent AI

Rule-based systems are a set of human-defined rules, a glorified dishwasher instruction manual, if I may. They represent what AI was for the first five decades of its existence.

But what do I mean by ‘set of rules’?

Simple, it’s a set of ‘if this happens, do this, but if this happens, do this, else do that’. In other words, it’s a set of guidelines that help the machine with its decision-making.

But hold on, that doesn’t sound very ‘intelligent’, right?

Of course not, but it helps the AI make ‘intelligent’ decisions, like recommending a set of relevant links to your query, or chatting with you through one of those dreadful chatbots you still see around.

It isn’t very powerful, but it’s there. In fact, before you ditch these systems, let me tell you that they play a very important role, as you’ll see in a minute.

Moving on, Machine Learning involves the same principle, making machines do ‘intelligent’ stuff, but this time, the machine learns from data.

The real breakthrough

There’s a reason why people talk about ML and AI interchangeably. Simply put, it’s the AI that ‘feels like AI’.

It’s extremely powerful, barely explainable (in the sense that they are generally extremely opaque systems that perform in ways we can’t even comprehend to this day), and it’s also the AI field that underpins most recent advancements.

The main premise is pretty simple. As part of a Machine Learning model, you have the following:

data,
a set of parameters (or variables), aka the model,
a learning algorithm,
a task,
and an objective to optimize against

The goal?

Finding the set of parameters (the model) that will perform a mapping, or transformation, between the input data and the prediction, the task we want the model to perform, through learning an objective.

The most straightforward way of thinking about ML models is as functions. Let’s take a simple example that will clarify the key intuitions easily.

If we think about the function y = m*x + b, the equation of a straight line, we have found the parameters, ‘m’ and ‘b’, that transform the input x into its value y, thereby creating the straight line. In this case, the variable ‘m’ gives us the gradient, the variable ‘b’ the y-intercept.

Now here’s the thing: by finding ‘m’ and ‘b’, an exercise known as ‘parametrization’ we can find the mapping between x and y that draws a straight line function. Thus, to find those variables, you need the following:

Data: Being a straight line, you simply need two examples, as in case you don’t know, a straight line can be represented in its parametric form, the form we are discussing, or in its value representation form, where you simply need two points belonging to that straight line
Parameters: as it’s a very simple case, you know that two parameters are enough to define a straight line
Algorithm: Again, being a very simple case, we can simply use the slope formula followed by the slope-intercept form
Task: Obviously, the task is to find the straight line

If we wanted to express this in Python code (I chose Python due to its high readability in case you can’t code), that would be:

def find_line_equation(p1, p2):
    """
    Finds the equation of a straight line given two points.
    
    Parameters:
    - p1: A tuple representing the first point (x1, y1)
    - p2: A tuple representing the second point (x2, y2)
    
    Returns:
    - The slope (m) and y-intercept (b) of the line in the form of a string "y = mx + b".
    """
    x1, y1 = p1
    x2, y2 = p2
    
    # Calculate the slope
    m = (y2 - y1) / (x2 - x1)
    
    # Calculate the y-intercept
    b = y1 - m * x1
    
    # Format the equation as a string
    if b < 0:
        equation = f"y = {m}x - {abs(b)}"
    else:
        equation = f"y = {m}x + {b}"
    
    return equation

# Example usage
point1 = (1, 2)
point2 = (3, 4)
equation = find_line_equation(point1, point2)
print(equation)

Seems fairly straightforward, right?

Now, take this idea and extrapolate it to whatever function you want to find. However, finding the function between two variables is easy, but what if we have billions of variables, with real examples like ChatGPT, that have a nothing-short-of-extraordinary task of modeling the entire English language?

In other words, how do I find the function (ChatGPT) that models language?

Well, I don’t want to get ahead of things, but the principles still stand:

Data: A huge dataset of English texts
Parameters: Billions of parameters set in layers known as a neural network (we’ll get there in a minute). Specifically for Large Language Models, the Transformer
Algorithm: Gradient descent
Task: Predict the next word in a sentence
Objective: Maximizing the negative log-likelihood

Yes, I know, that reads much more complicated, but the explanation goes well beyond the scope of a simple introduction to AI. But it doesn’t matter, because the case in point is for you to grasp the key intuition:

ML are systems that, by finding the intricate patterns in their data, have learned a function that allows them to use that data to make useful predictions, like predicting the next word in a sequence for ChatGPT, or classifying a certain patient as having cancer, or drawing a cartoon of the French Revolution using DALL-E.

All, in essence, do the same thing but with different data, task, architecture, and objective, meaning that the underlying model must also adapt.

But there’s one more AI type, the combination of the previous two.

Neurosymbolic AI systems

Neurosymbolic AI is much less fancy than it sounds. In simple terms, you get the best of both worlds of the previous two by merging them.

In practice, you use ML systems to find the key patterns in data, which can be considered as performing the low-level perception of the task, and use high-level, human-crafted code to perform high-level actions, such as planning or performing fast calculations.

For example, a widely-known neurosymbolic AI system is AlphaGeometry, from Google Deepmind, where the ML part ‘explores’ the different ways to prove a theorem, and two symbolic engines (two machines running rule-based code) explore each chosen path by performing the necessary calculations.

For instance, the ML system, which is ‘more intelligent’, if I may, will suggest calculating Pythagoras’ theorem to help prove the theorem, and the symbolic engines will take both sides of the triangle and calculate the third.

This way, you avoid the pitfalls of the two, the tendency to make silly mistakes that ML has, and the inability to find the answer to a constrained task from the symbolic engines.

I know that was too complex and, as this is a simple introduction to AI, I don’t want to get too deep into these systems because they are far more advanced than where we are right now.

Moving forward, I want to be honest with you. Out of all discussed until now, it’s no secret that the next logical step is to dive into Machine Learning.

Types of ML methods

When teaching an ML system to learn, a key point is the learning process (duh) and the data at your disposal. Except in one case, they all have one thing in common, the supervisory signal.

In simple terms, just be aware that to train an ML system you need some way of telling it if its predictions are accurate or not. That way, we can measure how wrong the system is, and adapt.

Imagine you are learning to shoot a basketball into the rim. At first, you are probably terrible. However, over time you get better. But why?

Well, you have a clear objective (scoring) and a clear way of seeing if you have scored (if the ball goes through the rim). That is your supervisory signal. Therefore, by practicing you slowly adapt the parameters (mechanics) of your body so that your shots are more consistent over time.

Hence, the supervisory signal is key, as without it you have no way of tuning your body parameters to get better. Now, consider that ML applies the same principle but with data.

There are four main ways to train an ML system:

Supervised training: Here, the data is fully labeled. In other words, for every data point, you have a clear indication of whether a prediction is correct or not. For example, if you are trying to predict the price of a house, the ‘house dataset’ used for training will include a column with the price of the already-known houses that you will use to teach the ML model to predict future prices.
Unsupervised training: Out of the four, this is the only one without a supervisory signal. Here, the process is extremely naive, in the sense that you don’t know what you are looking for. Thus, the goal of these ML systems is to discover unknown patterns. For example, if you have a dataset of flowers where some species aren’t known, an unsupervised model might cluster the images in a way that indicates that some unknown species have common patterns and, thus, you may discover new ones.
Reinforcement Learning: Here, the learning process is like playing a game. Your supervisory signal here is a reward, usually provided by another model that, for every prediction of the model you intend to train, gives a reward to the training model. If the reward is high, the model knows its prediction is good, if the reward is low, it learns to avoid performing that action/prediction again. It may sound similar to supervised learning, but the key difference is that here the agent is taking actions in an environment and measuring the rewards, unlike supervised learning where the feedback is direct (it doesn’t require choosing an action and seeing what happens).
Self-supervised learning: Probably the most important these days, it involves a mixture between points 1 and 2. Unlike supervised learning, where a group of humans must deliberately ‘label’ each data point, and unlike unsupervised learning, where no label is defined, this is an ‘in-between’ solution, as in self-supervised learning the training data acts as the supervisory signal without human intervention.

Focusing on point 4, which is extremely relevant these days, a clear example is a Large Language Model like ChatGPT.

To train such a model you require millions upon millions of text passages. For that, a supervised learning method would require humans to label every single word (they are in the trillions at this point), which is unfeasible.

But here’s the thing, you don’t have to, because to predict the next word (ChatGPT’s task) the word is, well, already there. Thus, what you do is hide – we call it masking – the next word and make ChatGPT predict it.

As the word is simply underneath the mask, we can uncover it and compare the prediction with the ground-truth word. Do you see what’s going on? The data itself acts as the supervisory signal! This is crucial, because with the right algorithm, you can feed these models all the data you can find with no human effort.

This seemingly unimportant fact is what has taken us to today with ChatGPT and other models.

Here, as you may have noticed from point 4, two hugely important factors are scale and feasibility. The more data you have, the less feasible it is to train it using supervised methods, which are otherwise the ideal method of training. Therefore, considering the sizes and types of models we are training, self-supervised learning has become the standard for training large models.

But there’s one more thing we need to discuss as part of this introduction, the types of ML models.

From standard statistics to neural networks

Remember when I said that people now have basically reduced the definition of AI to Machine Learning? Well, it’s actually worse.

To the world, AI = ChatGPT, which means that AI = Neural networks, a subset of ML.

But again, that’s blatantly false.

For starters, ML goes beyond neural networks. In particular, we have two types of ML algorithms:

Statistical algorithms
Neural networks (which are also based on statistics anyway, as ML is a purely statistical practice as a whole)

The former were, until the arrival of ChatGPT, the most popular algorithms by far, with examples like XGBoost. They require much less data and work wonderfully well with tabulated data.

However, with the arrival of foundation models, neural networks that excel at multiple different tasks without requiring any additional retraining, the AI industry has massively shifted toward the latter.

The last think I said is partially true. Although prompt engineering might do the job in most cases, some downstream tasks will still require some fine-tuning, aka retraining.

And although I won’t enter into the technical details in this blog post, you may ask, why?

Well, with the aforementioned scaling capacity of self-supervised methods and the extreme capacity to approximate any function between two distributions (recall the straight line section) that neural networks offer, we have reached a critical point with AI:

Generalization.

That is, instead of having to train one single model for every task as was common beforehand, with foundation models the AI space is finally ripe to completely transform everything around us.

But more on that another day.