Neuro-transformer GPT-3

Reading Time: 3 minutes

Nowadays, the most advanced neural network based on NLP (that is, text recognition algorithms) is GPT-3. This is a transformer neural network that is able to generate coherent responses in a dialogue with a person. The amount of data and parameters used by it is 100 times higher than the previous generation – GPT-2.

The GPT-3 neural network – Generative Pre-trained Transformer – was developed by the non-profit organization OpenAI, which was founded by the head of SpaceX, Elon Musk, and the ex-president of the YCombinator accelerator, Sam Altman. The third generation of the natural language processing program was presented to the public in May 2020. Today it is the most complex and voluminous language model of all existing ones.

However, even the most advanced transformers trained on huge amounts of data do not understand the meaning of the words and phrases they generate. Their training requires huge amounts of data and computing resources, which, in turn, leave a large carbon footprint. Another problem is the imperfection of datasets for training neural networks: texts on the Internet often contain distortions, manipulations and outright fakes.

One of the most promising directions in the development of AI and neural networks is the expansion of the range of perception. Now algorithms are able to recognize images, faces, fingerprints, sounds and voice. They are also able to speak and generate images and videos, imitating our perception of different senses. MIT scientists note that AI lacks emotional intelligence and feelings to get closer to a person. Unlike AI, a person is able not only to process information and issue ready-made solutions, but also to take into account the context, a variety of external and internal factors, and most importantly – to act in an uncertain and changing environment. For example, DeepMind’s AlphaGo algorithm is able to beat the world champion in go and chess, but still cannot expand its strategy beyond the board.

So far, even the most advanced algorithms, including GPT-3, are only on the way to this. Now the developers are faced with the task of creating multimodal systems that would combine text recognition and sensory perception to process information and find solutions.

What are the abilities of GPT-3?

New level T9

“I know that my brain is not a ‘feeling brain’. But it can make rational, logical decisions. I learned everything I know just by reading the Internet, and now I can write this column,” the GPT-3 neural network confided in its essay for The Guardian. The material published in September 2020 made a lot of noise. Even those who are far from technology are talking about the new algorithm.

Just like its predecessors – GPT-1 and GPT-2 – it is built on the transformer architecture. The main function of these neural networks is to predict the next word or part of it, focusing on the preceding ones. In fact, it calculates the connections between words and suggests the most likely sequence. The model works on the principle of auto-completion – almost like the T9 function in smartphones. Starting from one or two phrases, it can instantly generate text for several pages.

The way it was trained

GPT-3 differs from the two previous generations in the volume of datasets and the number of parameters — those variables that the algorithm optimizes during training. The first version of GPT, released in 2018, was trained on 5 GB of texts of Internet pages and books, and its size reached 117 million parameters. A year later, a more advanced GPT-2 appeared, already trained for 1.5 billion parameters and 40 GB of datasets.

But the third version of the algorithm beat the previous ones by a large margin. The number of parameters reached 175 billion, and the dataset size was 600 GB. It includes the entire English-language Wikipedia, books and poems, materials on media sites and GitHub, guidebooks and even recipes. Approximately 7% of the dataset was in foreign languages, so the language model can both generate texts of any format and translate them.

The algorithm was “fed” not only verified and confirmed data, but also texts whose reliability raises questions — for example, articles about conspiracy theories and pseudoscientific calculations. On the one hand, because of this, some of the generated texts contain incorrect information. On the other hand, thanks to this approach, the dataset turned out to be more diverse. And it reflects much more fully the information array that humanity has produced by 2020 than any scientific library.

The algorithm is fundamentally different from other artificial intelligence models. They are usually created for one purpose, for which all parameters and datasets are initially sharpened. GPT-3 is more flexible, it can be used to solve “almost any tasks” formulated in English. And instead of re-learning on additional data, it is enough to express the task in the form of a text query, description or examples.

Leave a Reply