What is a generative pretrained transformer (GPT)?

Artificial intelligence. Today with a special focus on the world around Chat GPT. And so we find ourselves in the middle of the cosmos of generative pre-trained transformers, or GPT for short.

Immerse yourself with us in the fascinating world of artificial intelligence. In our latest Rock the Prototype podcast episode, we discover the secrets behind Chat GPT, a revolutionary AI model that is radically changing the way we communicate with machines.

Semantic probabilities

We unravel the threads of semantic probabilities and reveal how AI technology can come so incredibly close to our human understanding. Is it true intelligence or simply a reflection of our own language? Join us on an exciting journey through the neural landscape of GPT and push your boundaries between human creativity and machine learning

Listen now:

🎧 Listen on Spotify: 👉 https://bit.ly/3VSgxK4

🍎 Enjoy on Apple Podcasts: 👉 https://apple.co/43SaEyk

While we have all heard of AI in various contexts or even interacted with it ourselves, behind the scenes is a fascinating technology that is completely revolutionizing the way we work, learn and even communicate.

Chat GTP is considered the most advanced AI model

That’s why today we’re focusing on the very technology that has helped AI make a decisive breakthrough in the public eye. We are of course talking about GPT, an advanced AI model developed by OpenAI.

In this article – or you can just listen to my podcast – I unravel the mystery of GPT and the AI models underlying this technology, the generative pre-trained transformers that currently significantly influence how we perceive and interact with artificial intelligence.

We dive into the history of AI and explore how AI works with its software-based models and algorithms. We reflect on the diverse applications of AI and look at the associated ethical challenges as well as the exciting future that awaits us with these AI technologies.

What exactly is GPT?

In short, it’s an impressively powerful AI that is trained to understand, generate and even respond to text in a way that is often almost indistinguishable from human interaction.

GPT models have already fundamentally changed the rules of the game. From simple to complex answers to the most diverse questions in all conceivable subject categories. From ideas to writing entire articles and even programming, these AI models are now proficient.

How does it all work? How reliable is the information generated with AI?

But how does it all work? And how reliable is the information generated with AI?

Although this may sound like pure science fiction to some, it is reality – a reality that becomes more and more sophisticated with every GPT update. But why should we bother with it at all? Well, the answer is simple: GPT and similar AI technologies are shaping our future.

Exactly. The way we search for information, how we learn and how we interact with machines is changing radically.

GPT is not only a window into a new era of AI-driven communication and creativity, it is also a tool that has the potential to redefine education, the corporate world, programming and even artistic expression.

And that is exactly why understanding GPT is so crucial. Not just for technology enthusiasts or developers, but for everyone. The way we think about problem solving, innovation and even the limits of human creativity is at stake.

Let’s now take a look at the facts about this fascinating AI technology:

Developed by the US AI specialist OpenAI, GPT has become a successful model in the world of artificial intelligence. But what is GPT? What does the acronym stand for?

GPT stands for“Generative Pre-trained Transformer“. Let’s break it down piece by piece and translate it into German to better understand what’s behind this abbreviation.

“Generative” means producing or generating. In the AI world, we talk about generative models when it comes to creating content. These can be texts, images or even music. The special thing about GPT is its ability to generate texts that are so convincing that they are almost indistinguishable from those written by humans.

“Pre-trained” – that means pre-trained. Before GPT is entrusted with specific tasks, it undergoes an extensive learning phase. In this phase, it is fed with huge amounts of text data to gain a basic understanding of the language. This pre-training is crucial so that an AI model can later respond efficiently to more specific requests.

And finally “Transformer” – the part of the name that refers to the revolutionary AI architecture on which GPT is based. The Transformer architecture allows the model to understand the meaning of words in the context of other words, rather than looking at them in isolation. This is a key aspect that characterizes GPT’s impressive ability to generate coherent and nuanced texts.

GPT is a generative, pre-trained AI model based on the Transformer architecture. Chat GPT is a prime example of how far artificial intelligence has come in recent years.

Now that we know what’s behind the acronym GPT, let’s dive deeper into the mechanisms of this AI technology that make it so powerful.

How does GPT work and why is this AI model so groundbreakingly revolutionary?

GPT is based on the Transformer architecture, an innovation that entered the artificial intelligence stage in 2017 and has been here to stay ever since. This AI technology uses huge amounts of data, learns from it and allows us to communicate with machines in a way that was previously science fiction.

Now we explore how a GPT model is trained and how it acquires its impressively generative, i.e. almost creative and so amazingly natural-looking abilities.

How is a GPT model actually trained?

To understand this, we must first familiarize ourselves with the basic principles of the Transformer architecture, which forms the core of GPT.

A key element of this architecture is the so-called self-attention mechanism.

This is a technical mechanism used by the AI, i.e. the way the AI model performs internal calculations to recognize relationships between words in a sentence.

This clever mechanism in the AI model makes it possible to understand the meaning of each word in a sentence in the context of all the other words.

Listen now:

🎧 Listen on Spotify: 👉 https://bit.ly/3VSgxK4

🍎 Enjoy on Apple Podcasts: 👉 https://apple.co/43SaEyk

Instead of moving sequentially from word to word, as previous AI model architectures did, the self-attention mechanism evaluates each word in the context of the entire sentence and does so simultaneously, i.e. in parallel.

The AI therefore focuses its own attention on the meaning in the overall context. Looking at something in context is something I personally love!

This procedure, which computers master perfectly with their parallel approach, is often difficult for us humans.

Computers and especially models like GPT use parallel processing to analyze large amounts of data simultaneously. This ability leads to more efficient and faster processing of information.

There is no need to underestimate our cognitive abilities, after all, we humans invented this parallel data processing of computers in order to use it specifically for effective and meaningful AI assistance systems.

At this point, we can state that the development of AI models also uses a targeted prototypical approach to develop and improve effective AI models.

Cleverly process large amounts of data with AI

This AI technology therefore enables large volumes of data to be processed more efficiently.

Imagine you are reading a sentence and can simultaneously grasp the meaning of each word in the context of all the other words. This is exactly what this mechanism enables GPT to do.

Let’s take the sentence: “The doctor gave the patient a medicine because he was ill.” A person intuitively understands that “he” refers to “the patient”. GPT can track this through the self-attention mechanism by recognizing the relationship between widely spaced words and weighting the meaning of each word in relation to every other word in the text.

But how does GPT take into account the order of the words if the self-attention mechanism does not process the input words in any particular order? This is where positional encoding comes into play. It adds additional information to each word about its position in the sentence, which allows the model to recognize the position of each word and understand the grammatical structure of the text.

Self-attention mechanism & positional encoding

But how does GPT take into account the order of the words if the self-attention mechanism does not process the input words in any particular order? This is where positional encoding comes into play. It adds additional information to each word about its position in the sentence, which allows the model to recognize the position of each word and understand the grammatical structure of the text.

Let us briefly illustrate the revolutionary difference between this AI technology and traditional AI models:

In the past, many approaches in artificial intelligence, especially in the field of natural language processing, were based on tree structures and concepts from graph theory. These methods attempted to depict the complex structure of language using hierarchical models that represent the relationships between words and phrases in a kind of tree diagram.

However, GPT and the entire Transformer architecture take a different path.

Instead of relying on predefined structures, they use the self-attention mechanism and positional encoding to directly capture the meaning and relationships of words within a sentence – without the need to organize these relationships into rigid tree structures. This approach allows for a more flexible and often more accurate interpretation of text, as it can take into account the direct and indirect relationships between all the words in a text.

How do generative pretrained transformers (GPT) work? - Source: midjourney

How do generative pretrained transformers (GPT) work? – Source: midjourney

This innovation has fundamentally changed the way machines understand and generate text. While traditional models attempt to decode language using fixed rules and structures, GPT allows AI to develop a more intuitive and comprehensive understanding of texts. It is as if we have made the leap from rigid, predefined paths to an open field full of possibilities where the AI can navigate and learn freely.

This paradigm shift is a very important point that you can take away from my article here as a key moment in the development of artificial intelligence.

It shows how far we’ve come and opens the door to new, exciting ways in which AI can understand and interact with our world.

By combining self-attention and positional encoding, GPT can thus generate and interpret impressive texts that demonstrate a deep understanding of both context and grammatical structure. The result? A performance that often appears astonishingly human-like.

But this is just the beginning. Stay tuned, because next we’ll dive deeper into GPT’s training process and discover how unsupervised learning and the ability to predict words based on their context make GPT one of the most advanced AI models today.

Having already highlighted the revolutionary technology behind GPT and how it differs from traditional AI models, we turn to another fascinating aspect: the pre-training process. How exactly does GPT learn to generate human-like texts? The answer lies in unsupervised learning, a process that allows GPT to be trained without explicitly labeled data.

GPT is like a curious child

Let’s imagine that GPT is a curious child living in a huge library full of books, articles and websites. Instead of telling him what each word means, you just let him read it. By reading, GPT tries to understand the structure of the language, its patterns and the meaning of the words in context. For example, it learns to anticipate the next word in a sentence or to complete missing parts of a text. In this way, GPT develops an intuitive feeling for language, similar to a person who learns by listening and reading.

How does GPT process all this information?

The first step is tokenization.

Generative AI models such as GPT divide text data into units called tokensfor processing.

The conversion of text data into tokens depends on the tokenizer used. A token can be characters, words or groups of words. Each AI model has a maximum number of tokens that it can process in one prompt and one response.

GPT therefore breaks down each text into smaller units, the so-called tokens. Then the embeddings come into play.

Each token is converted into a vector that maps the diverse semantic and syntactic information of the token in a high-dimensional space.

Embeddings as a multidimensional representation of creative AI

These vectors, or embeddings, serve as a kind of multidimensional representation of the tokens, enabling the AI model to recognize and process subtle differences in meaning and relationships between different words.

In essence, embeddings capture the essence of words in a way that goes beyond the traditional binary representation of text. They enable GPT to understand patterns, contexts and the nuances of human language at a deeper level, which forms the basis of its ability to generate coherent and contextually relevant texts.

These vectors or embeddings are therefore at the heart of GPT’s learning process. They enable the AI model to understand the meaning of the words and their relationships to each other.

During the pre-training process, this network of vectors is continuously refined and optimized.

The goal?

That GPT learns to represent similar words by similar vectors, developing a deep understanding of language and its nuances.

This process of reading, learning and understanding forms the basis for GPT’s ability to generate texts that are almost indistinguishable from human ones.

It is a clever AI process that seems simple in its basic idea – to break down text into tokens and convert them into multidimensional vectors.

However, the actual implementation is extremely complex. Each token is placed in a space of countless dimensions, with subtle semantic nuances and syntactic relationships encoded in vector form.

This highly detailed and nuanced processing of speech requires immense computing power. This is because the training and application of models such as GPT requires billions of parameters to be updated and optimized in real time, which would not be feasible without advanced hardware and specialized AI algorithms.

This need for enormous computing capacity underlines the technological masterpiece behind GPT and similar AI systems.

All of this impressively illustrates what modern artificial intelligence is capable of and underlines its impressive capacity to not only recognize complex speech patterns, but to process and generate them in a way that was unimaginable until recently.

Indispensable: the fine-tuning of AI models

Now that we know how GPT acquires a broad base of language knowledge through the process of unsupervised learning, let’s dive deeper into the next crucial step: fine-tuning. This phase is crucial to transform GPT from a general language model into a highly specialized tool tailored to specific needs.

Imagine GPT is like a versatile artist who is able to paint in many different styles. However, in order to create a masterpiece in a particular style, this artist must further refine and specialize his knowledge and skills. This is exactly what happens when GPT is fine-tuned. The model, which already has a comprehensive understanding of the language, is now trained with a more specific data set that is precisely tailored to the desired task.

Let’s take the creation of legal documents as an example. Having already been trained on a wide range of texts, GPT undergoes an additional training phase with a dataset of legal texts. In this phase, the AI model learns the subtleties of legal language, its style and structure.

The result? GPT is now able to generate legal documents that are not only correct in terms of content, but also comply with the specific conventions and requirements of legal writing. This ability to fine-tune gives GPT impressive flexibility and makes it an indispensable tool for professionals in a wide range of fields.

Fine-tuning thus transforms GPT from a powerful general AI language model into a customized tool that is precisely tailored to the requirements of a specific task or field. This underlines the extraordinary adaptability and versatility of modern artificial intelligence.

What appears so intelligent to us humans in GPT-based AI is actually based on an extremely clever principle of semantic probabilities. After all, semantics is the art of giving words meaning in context.

But it is important to understand that this “intelligence” is not in fact the same as our human cognition or understanding language in the same way that humans do.

Rather, it is a highly developed form of pattern recognition and prediction. GPT cannot really grasp the sense or meaning behind the words. Instead, the AI simulates understanding by predicting which words or phrases appear to make the most sense in a given context based on statistical patterns and probabilities from its training text.

After the fascination comes the reflection

As we explore the impressive capabilities of AI models such as GPT and similar AI technologies, we must not forget that we too continue to face challenges through the use of artificial intelligence.

One of the most significant is the context limitation – i.e. how much of the text GPT can retain and use for its current calculations.

Earlier versions of GPT were significantly limited in this respect, which made it difficult to create long and coherent texts.

Fortunately, progress has been made. Technical improvements and more efficient training methods have increased the maximum context length in newer versions of GPT.

GPT-4 has made enormous progress in this respect. The context length was extended to over 131,072 tokens.

GPT uses an extended character set that includes Unicode characters encoded in UTF-8 so that it can support a wide range of languages and special symbols, enabling the generation and interpretation of text in a variety of human languages, including those with non-Latin writing systems.

Based on the assumption that a token comprises an average of around 4 characters, that would be over half a million characters that GPT can process in a context of 131,072 tokens.

Our assumption is based on the English language, in which many words are relatively short. This estimate therefore varies for different languages, but will be higher for our German language.

This assumption helps us to get a rough idea of how much text – in characters – the AI model can process or “understand” in a given context.

Since tokens can vary in length – some could be only one character long – like a punctuation mark or a single letter – while others could be multiple characters long (like a whole word or a compound phrase) – assuming an average of 4 characters per token serves as a middle ground to estimate the overall capacity of the model to process text. You can find the scientific report from Open AI here.

But despite these improvements, we have to realize that AI models still have to deal with distortions, so-called hallucinations. The AI invents facts here and thus sometimes misleads users.

So-called adversarial prompts also tempt the AI to give unwanted answers.

These are basically specifically formulated queries or inputs that aim to trick the AI or trick it into giving answers that are inaccurate, inappropriate or even against its programmed guidelines.

It is therefore also crucial to responsibly test the limits of AI and its capabilities, understand intentions and respond appropriately.

Such targeted testing highlights the need for continuous improvement in a way that allows us to understand AI models accurately and ensure their responses remain within ethical and regulatory guardrails.

This dynamic interaction between AI and our human communication shows how important it is to understand our models not only in terms of their ability to process information, but also requires ethical and legal guidelines tailored to them.

The journey of artificial intelligence is far from over. With every step we take forward, new doors and fascinating opportunities open up, but they also come with responsibilities.

How we shape and use this technology will help shape the future of our society.

Your click, our common path: How your support shapes our digital future

Stay tuned, my Rock the Prototype format provides you with free valuable knowledge on crucial tech topics in software development andtheir social impact.

Please support me by subscribing to my newsletter, podcast & YouTube channel. Of course, I also appreciate your feedback, comments and likes!

Subscribe to my Linkedin Newsletter now: 👉 https://lnkd.in/exv82i4M 👈 Compact information – easy to understand!

Until then, stay safe, creative and above all curious!

Your Sascha Block

About the Author:

Sascha Block

I am Sascha Block – IT architect in Hamburg and the initiator of Rock the Prototype. I want to make prototyping learnable and experiential. With the motivation to prototype ideas and share knowledge around software prototyping, software architecture and programming, I created the format and the open source initiative Rock the Prototype.