Introduction
Large Language Models (LLMs) are the AI models which is designed for understanding, processing, and generating human language. LLMs are trained by a huge amount of data which helps them to learn patterns, relationships, and structures from the data in natural language. LLMs are used to predict the Next Token.
Example:
user: The capital of India is
response: The capital of India is Delhi.
The model will predict the next word continuously until it gets end of the sentence.
Some example of LLMs are:
- GPT models
- LLaMA
- Gemini
GPT:- Generative Pre-trained Transformer
Evolution of LLMs (Short Overview)
Initially, NLP models were based on RNNs and LSTM networks, which were used for sequence processing tasks like:
- Machine translation
- Text summarization
- Chatbots
- Speech-to-text
Encoder-Decoder Architecture → Attention Mechanism → Transformers → Large Language Models
These improvements were needed to solve limitations like:
- Short context memory
- Sequential processing
- Poor long sentence understanding
What are Embeddings?
embedding is the understanding system of any
Large Language Models (LLMs). Transformers do not understand the Token IDs for transformers token IDs are just any random numbers.
Embedding is a dense numerical vector which represents the meaning of any tokens. Embedding layer is a giant lookup table known as embedding matrix which defines the vector values of the token. The words having similar meaning gets the same positions.
Embedding matrix:- Embedding matrix is a giant table to represent vectors from the token.
example:-
Vocabulary - 50000 tokens
embedding size - 4096
matrix = 50000 * 4096
where each row represents one token
Embedding contains:-
- Vector representation
- Vector Length
- Vector Distance
- Vector Similarity
- Vector Relationship
- Sematic Search
Mathematical Foundations of Embeddings
we need some mathematical concepts to understand embeddings properly.
1. Scalars
A scalar is a single number which only have magnitude but no direction.
Example:
1, 2, 3
2. Vectors
Vector understanding of the ordered collection of numbers. Real-world objects can not be explained by using numbers. It represent the multiple features in a single object.
Example:
[2, 5, 7]
3. Vector Dimensions
Vectors are extremely important to understand the embeddings. Dimension can be explained as the number of elements present in a vector. Mostly embeddings are of 768, 1024, or more dimensional.
Example:
[0.2, -0.5, 2.4] → 3D vector
4. Tensors
Tensors are the Generalization of scalar, vectors or matrices. Deep Learning have to handle data of multi-Dimensions. It helps to represent the complex data.
- Scalar → 0D tensor
- Vector → 1D tensor
- Matrix → 2D tensor
- Higher → nD tensor
LLMs work entirely on tensors.
5. Vector Space
Vector space is the collection of vectors. In Machine Learning (ML) vectors are non insolated, relationship is important in any space. It solves the problems like Addition, Scaling, Distance and similarity.
6. Magnitude and Norms
Magnitude defines the length of a the vector. Embeddings are used in normalization, regularization and similarity. Norms is of two types L1 and L2.
Common norm:
L2 Norm = √(x² + y² + z²)
7. Distance
Distance tells how far two vectors are.
Common types:
- Euclidean Distance
- Manhattan Distance
8. Similarity
Similarity defines how close two vectors are in meanings.
commonly used similarity:
Cosine Similarity
High similarity means the meanings are similar.
How Embedding Works
Embeddings are generated using an Embedding Matrix.
Step 1: Tokenization
Firstly, when model inputs any sentence then the text is converted into tokens:
sentence: I love AI
token IDs: [12, 532, 981]
Step 2: Embedding Matrix
Each token ID maps to a vector:
50000 × 768 matrix
Step 3: Lookup
Each token is replaced with its vector representation.
Step 4: Training
Initially embeddings are random.
During training:
- Backpropagation updates vectors
- Similar words move closer
- Meaningful structure is formed
Token Embeddings
Token embeddings are the core embeddings used in LLMs.
Each token is mapped to a dense vector that represents its meaning.
Example:
dog → [0.2, 0.8, ...]
cat → [0.21, 0.79, ...]
car → [-0.9, 0.1, ...]
Here:
- dog and cat are close (similar meaning)
- car is far (different meaning)
Positional Embeddings
Transformers do not understand order of words.
Example:
Dog bites man ≠ Man bites dog
So positional embeddings are added to give order information.
Final input:
Final Embedding = Token Embedding + Positional Embedding
Why Embeddings are Important in LLMs
Embeddings are the foundation of LLM intelligence because:
- They convert text into mathematical space
- They preserve semantic meaning
- They allow similarity computation
- They enable retrieval systems (RAG)
- They form the input of Transformers
Real-World Applications
- ChatGPT
- Google Search
- Recommendation systems
- RAG-based chatbots
- Vector databases
- Semantic search engines