HOW LLM’s work (Large Language Models)

6 November 2023

HOW LLM’s work (Large Language Models)

Reading Time: 3 minutes

Large Language Models: What Your Data Must Include | Webz.io

Training and neural networks

The training process is typically performed using a technique called backpropagation. In backpropagation, the LLM is given a sequence of words as input and it produces a sequence of words as output. The LLM’s output is then compared to the desired output, and the LLM’s parameters are adjusted to reduce the error between the two outputs.

This process is repeated over and over again, until the LLM is able to predict the next word in the sequence with a high degree of accuracy.

The neural network architecture that is most commonly used for LLMs is the transformer neural network. Transformer neural networks are able to model long-range dependencies in sequences, which is essential for many NLP tasks.

Transformer neural networks work by using a self-attention mechanism. Self-attention allows the LLM to learn relationships between different parts of the input sequence, without having to process the sequence sequentially.

This makes transformer neural networks very efficient and effective for training LLMs.

Once the LLM is trained, it can be used to perform a variety of tasks, such as:

-Generating text

-Translating languages

-Answering questions

-Writing different kinds of creative content

11.7. The Transformer Architecture — Dive into Deep Learning 1.0.3 documentation

Probability Distrybiution

LLM uses a probability distribution over the next word in the sequence. This probability distribution is calculated using the LLM’s parameters and the previous words in the sequence.

The LLM then generates the next word by sampling from this probability distribution.

Here is a simplified example of how an LLM might generate text:

-The user provides the LLM with a prompt, such as “Write a poem about a cat.”

-The LLM generates the first word of the poem by sampling from a probability distribution over the next word in the sequence.

-The LLM then generates the second word of the poem by sampling from a probability distribution over the next word in the sequence, given the first word of the poem.

-The LLM repeats this process until it reaches the end of the poem.

Softmax function (probability calculation)

There are a few reasons why the softmax function is commonly used in large language models (LLMs) to calculate the probability distribution over the next word in the sequence:

– The softmax function ensures that the probabilities sum to 1. This is important for tasks such as classification and prediction, where we want to know the probability that a given input belongs to a particular category.

– The softmax function is easy to compute. This is important for LLMs, which need to be able to generate text in real time.

– The softmax function is well-behaved mathematically. This makes it easy to train and deploy LLMs.

Some other probability distribution tools, such as the sigmoid function and the hyperbolic tangent function, do not have all of these advantages. For example, the sigmoid function does not ensure that the probabilities sum to 1. The hyperbolic tangent function is more difficult to compute than the softmax function.

softmax:

https://deepai.org/machine-learning-glossary-and-terms/softmax-layer

https://en.wikipedia.org/wiki/Softmax_function

Transformer NN

https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)

Backpropagation

https://en.wikipedia.org/wiki/Backpropagation

I used Bard. Prompts: “Make this blog post more organized and coherent: ”, “why is Softmax commonly used in large language models”, “give me an example of how probability distribution work”

If you want to do your own research there is a great series on wikipedia about AI that includes all the knowlage you need to understand the process behind this technology.

pictures

https://webz.io/wp-content/uploads/2023/03/Large-Language-Models-01-830×363.jpg.webp

https://d2l.ai/_images/transformer.svg

https://miro.medium.com/max/781/1*KvygqiInUpBzpknb-KVKJw.jpeg

https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60

Tagged large language models, neural networks, probability distrybiution, softmax function

21 November 2022

3 Comments

Inhuman music

Reading Time: 3 minutes

In the 21st century, artificial intelligence can do everything. Painting pictures, driving cars, helping doctors in medicine, and what about music? Does AI know how to compose music and write lyrics for songs?

In truth, an artificial intelligence can do this, too. Not with a soul, not with such a huge meaning, as a human does, because a robot has no feelings, but still knows how and even writes lyrics for music.

How exactly do neural networks create music? The general principle is that the neural network “looks” at a huge number of examples and learns to generate something similar. But it is impossible to formulate a task for a neural network to write beautiful music, because it is impossible to create a formula that will fulfill this task, since this is a non-mathematical requirement. It is interesting only when the neural network reproduces something that exists. The approach by which this music is created is called an auto-encoder (Generative Adversarial Network). It works like this:
We compress the music at the input into a very compact representation and then expand it back to its original form. A compact representation does not allow you to remember everything completely that was in the music. Therefore, the neural network is forced to put some common properties for music into the software part. And then, when generating music, we take a random sequence of numbers, apply the rules about the knowledge of music learned by the neural network and get a piece of music that looks like a human.

Turing Music Test
How to understand that a piece of music created by a machine is really worthy of our attention? To test the work of artificial intelligence systems, a Turing test was invented. His idea is that a person interacts with a computer program and with another person. We ask questions to the program and the person and try to determine who we are talking to. The test is considered passed by the program if we cannot distinguish the program from a person. For example, the DeepBach algorithm was tested, which generates notes in the Bach style. More than 1.2 thousand were interviewed. People (both experts and ordinary people) who had to distinguish the real Bach from the artificial one. And it turned out that it was very difficult to do this — people can hardly distinguish between music composed by Bach and created by DeepBach.

What about the lyrics?
Well, we’ve sorted out the musical compositions, but what about the lyrics for the songs? Can artificial intelligence compose poetry? Yes, and this task is even easier than writing melodies, although there are also enough difficulties here — the algorithm needs not only to “come up” with a meaningful text, but also to take into account its rhythmic structure.
In 2016, the developers of Yandex released the album “Neural Defense”. It includes 13 songs, the lyrics for which were composed by artificial intelligence. A year later, the album “Neurona” was released with four songs in the style of Nirvana, the verses for which were also generated by neural networks.

Thus, we see that artificial intelligence is able to write even music and lyrics for it, but will it ever replace songs written by a person in which feelings and life situations were invested?

Sources/references:

https://knowhow.pp.ua/ai_music

https://youtu.be/lv9W7qrYhb

https://youtu.be/-yTs58ityvs

https://towardsdatascience.com/generating-music-using-deep-learning-cb5843a9d55e

Tagged AI, music technology, neural networks

25 October 2020

AI learns to generate images from text and begins to better understand our world

Reading Time: 2 minutes

OpenAI, co-founded by Elon Musk, has created the world’s most stunning AI model to date. GPT-3 (Generative Pre-trained Transformer 3) without any special prompts, can compose poems, short stories and songs, making one think that these are the work of a real person. But eloquence is just a gimmick, not to be confused with a human understanding of the environment. But what if the same technologies were trained simultaneously on text and images?

Researchers from the Paul Allen Institute for Artificial Intelligence have created a special, visual-linguistic model. It works with text and images and can generate pictures from text. The pictures look disturbing and strange, not at all like the hyperrealistic “deepfakes” created by generative adversarial networks (GANs). However, this capability has long been an important missing piece.

The aim of the study was to reveal whether neural networks can understand the visual world as humans. For example a child who has learned a word for an object can not only name it, but also draw the object according to the hint, even if the object itself is absent from his point of view. So the AI2 project team suggested the models do the same: generate images from captions.

The final images created by the model are not entirely realistic upon close inspection. But it is not important. They contain the correct high-level visual concepts. AI simply draws the way a person who cannot draw would draw on paper.

This makes sense: converting text to an image is more difficult than doing the opposite.

“A caption doesn’t specify everything contained in an image,” says Ani Kembhavi, AI2’s computer vision team leader.

Creating an image from text is simply a transformation from smaller to larger. And it’s hard enough for the human mind, apart from programs. If a model is asked to draw a “giraffe walking along a road,” then it needs to conclude that the road will be gray rather than bright pink, and will pass next to a field rather than the sea. Although all this is not obvious to AI.

Sample images generated by the AI2 model from captions. Source: AI2

This stage of the research shows that neural networks are capable of creating abstractions – a fundamental skill for understanding our world.

In the future, this technology will allow robots to see our world as well as humans, which will open up a huge scope of possibilities. The better the robot understands the environment and uses language to communicate, the more complex tasks it will be able to perform. In the current perspective, programmers can better understand the aspects of machine learning

“Image generation has really been a missing puzzle piece, By enabling this, we can make the model learn better representations to represent the world.”

Sources:

https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/

https://www.technologyreview.com/2020/09/25/1008921/ai-allen-institute-generates-images-from-captions/

https://habr.com/en/company/madrobots/blog/522750/

Tagged Artificial Intelligence, deep learning, GPT-3, neural networks

Kozminski Techblog

A blog on technology, run by Kozminski University students and supervised by NeRDS

Tag Archives: neural networks