Skip to main content

Command Palette

Search for a command to run...

Introduction Of Gen AI

Updated
3 min read

GPT (Generative Pre-trained Transformer) is a specific architecture that utilizes next-token prediction as its training objective.

Workign of transformer

step 1: Tokenization [ encoding ]

llm understand maths. so it brake the word in numbers like this.

hello there ! ---> [2, 17534, 1104, 1243]

step 2: Vector Embeding

Vector embeddings are numerical representations of data, like words, images, or audio, that capture semantic relationships and meaning, allowing machine learning models to process and compare them efficiently.

step 3: Positional encoding.

Positional encoding gives Transformer models a sense of order by injecting information about the position of words in a sentence. Since Transformers process words in parallel rather than sequentially, this technique adds a unique numerical vector to each word embedding, allowing the model to distinguish between "dog bites man" and "man bites dog".

Step 4: single head self attention

Self-attention: is an AI mechanism that helps models understand the context of a word by looking at other words in the same sentence. It calculates how relevant every word is to every other word, allowing the model to focus on important, related words to define the meaning of the current word.

vectro are allowed to talk to each other and change there vector embedding. context of word is maintain.

Multi-head attention: is a mechanism in Transformer models that runs multiple attention layers (heads) in parallel to process input data. By projecting Queries (q), Keys (k), and Values (v) into different representation subspaces, it allows the model to simultaneously capture diverse relationships and semantic dependencies, such as syntactic structures or positional proximity, rather than relying on a single attention score.

LLM IN TRAINING PHASE AND INFERENCING PHASE:

TERMINOLOGY OF GENAI.

1. Vocab Size: The number of unique tokens in a language model is referred to as the vocab size.

2. Linear Output Layer (Regression)

  • Used for regression tasks where the goal is to predict continuous, unbounded numerical values (e.g., house prices, temperatures).

3. Softmax Output Layer (Multi-class Classification)

  • Purpose: Used for multi-class classification, specifically when classes are mutually exclusive (an input belongs to exactly one class).

3. Knowledge Cutoff

  • Since the model is trained only on existing data, it cannot answer questions about information it hasn't been trained on. For example, if you ask about the current weather in Delhi, it won't be able to provide an answer.