Tag Archives: large language models

HOW LLM’s work (Large Language Models)

Reading Time: 3 minutes
Large Language Models: What Your Data Must Include | Webz.io

Training and neural networks

The training process is typically performed using a technique called backpropagation. In backpropagation, the LLM is given a sequence of words as input and it produces a sequence of words as output. The LLM’s output is then compared to the desired output, and the LLM’s parameters are adjusted to reduce the error between the two outputs.

This process is repeated over and over again, until the LLM is able to predict the next word in the sequence with a high degree of accuracy.

The neural network architecture that is most commonly used for LLMs is the transformer neural network. Transformer neural networks are able to model long-range dependencies in sequences, which is essential for many NLP tasks.

Transformer neural networks work by using a self-attention mechanism. Self-attention allows the LLM to learn relationships between different parts of the input sequence, without having to process the sequence sequentially.

This makes transformer neural networks very efficient and effective for training LLMs.

Once the LLM is trained, it can be used to perform a variety of tasks, such as:

-Generating text

-Translating languages

-Answering questions

-Writing different kinds of creative content

11.7. The Transformer Architecture — Dive into Deep Learning 1.0.3  documentation

Probability Distrybiution

LLM uses a probability distribution over the next word in the sequence. This probability distribution is calculated using the LLM’s parameters and the previous words in the sequence.

The LLM then generates the next word by sampling from this probability distribution.

Here is a simplified example of how an LLM might generate text:

-The user provides the LLM with a prompt, such as “Write a poem about a cat.”

-The LLM generates the first word of the poem by sampling from a probability distribution over the next word in the sequence.

-The LLM then generates the second word of the poem by sampling from a probability distribution over the next word in the sequence, given the first word of the poem.

-The LLM repeats this process until it reaches the end of the poem.

Softmax - Ai Cheat Sheet

Softmax function (probability calculation)

There are a few reasons why the softmax function is commonly used in large language models (LLMs) to calculate the probability distribution over the next word in the sequence:

– The softmax function ensures that the probabilities sum to 1. This is important for tasks such as classification and prediction, where we want to know the probability that a given input belongs to a particular category.

– The softmax function is easy to compute. This is important for LLMs, which need to be able to generate text in real time.

– The softmax function is well-behaved mathematically. This makes it easy to train and deploy LLMs.

Some other probability distribution tools, such as the sigmoid function and the hyperbolic tangent function, do not have all of these advantages. For example, the sigmoid function does not ensure that the probabilities sum to 1. The hyperbolic tangent function is more difficult to compute than the softmax function.

Softmax Activation Function Explained | by Dario Radečić | Towards Data  Science

softmax:

https://deepai.org/machine-learning-glossary-and-terms/softmax-layer

https://en.wikipedia.org/wiki/Softmax_function

Transformer NN

https://en.wikipedia.org/wiki/Transformer_(machine_learning_model)

Backpropagation

https://en.wikipedia.org/wiki/Backpropagation

I used Bard. Prompts: “Make this blog post more organized and coherent: ”, “why is Softmax commonly used in large language models”, “give me an example of how probability distribution work”

If you want to do your own research there is a great series on wikipedia about AI that includes all the knowlage you need to understand the process behind this technology.

pictures

https://webz.io/wp-content/uploads/2023/03/Large-Language-Models-01-830×363.jpg.webp

https://d2l.ai/_images/transformer.svg

https://miro.medium.com/max/781/1*KvygqiInUpBzpknb-KVKJw.jpeg

https://towardsdatascience.com/softmax-activation-function-explained-a7e1bc3ad60

Tagged , , ,