Embedding in Large Language Model (LLMs)


Introduction

Large Language Models (LLMs) are the AI models which is designed for understanding, processing, and generating human language.  LLMs are trained by a huge amount of data which helps them to learn patterns, relationships, and structures from the data in natural language.  LLMs are used to predict the Next Token.

Example:

user: The capital of India is
response: The capital of India is Delhi.

The model will predict the next word  continuously until it gets end of the sentence.

Some example of LLMs are:

  • GPT models
  • LLaMA
  • Gemini

GPT:- Generative Pre-trained Transformer

Evolution of LLMs (Short Overview)

Initially, NLP models were based on RNNs and LSTM networks, which were used for sequence processing tasks like:

  • Machine translation
  • Text summarization
  • Chatbots
  • Speech-to-text

evolution of LLM:

Encoder-Decoder Architecture → Attention Mechanism → Transformers → Large Language Models

These improvements were needed to solve limitations like:

  • Short context memory
  • Sequential processing
  • Poor long sentence understanding

What are Embeddings?

embedding is the understanding system of any Large Language Models (LLMs).  Transformers do not understand the Token IDs for transformers token IDs are just any random numbers.  
Embedding is a dense numerical vector which represents the meaning of any tokens.  Embedding layer is a giant lookup table known as embedding matrix which defines the vector values of the token.  The words having similar meaning gets the same positions.
Embedding matrix:- Embedding matrix is a giant table to represent vectors from the token.
example:- 

Vocabulary - 50000 tokens
embedding size - 4096
matrix = 50000 * 4096
where each row represents one token

Embedding contains:-

  • Vector representation
  • Vector Length
  • Vector Distance
  • Vector Similarity
  • Vector Relationship
  • Sematic Search

Mathematical Foundations of Embeddings

we need some mathematical concepts to understand embeddings properly.

1. Scalars

A scalar is a single number which only have magnitude but no direction.

Example:

1, 2, 3

2. Vectors

Vector understanding of the ordered collection of numbers.  Real-world objects can not be explained by using numbers.  It represent the multiple features in a single object.

Example:

[2, 5, 7]

3. Vector Dimensions

Vectors are extremely important to understand the embeddings.  Dimension can be explained as the number of elements present in a vector.  Mostly embeddings are of 768, 1024, or more dimensional.

Example:

[0.2, -0.5, 2.4] → 3D vector

4. Tensors

Tensors are the Generalization of scalar, vectors or matrices.  Deep Learning have to handle data of multi-Dimensions.  It helps to represent the complex data.

  • Scalar → 0D tensor
  • Vector → 1D tensor
  • Matrix → 2D tensor
  • Higher → nD tensor

LLMs work entirely on tensors.

5. Vector Space

Vector space is the collection of vectors.  In Machine Learning (ML) vectors are non insolated, relationship is important in any space.  It solves the problems like Addition, Scaling, Distance and similarity.

6. Magnitude and Norms

Magnitude defines the length of a the vector.  Embeddings are used in normalization, regularization and similarity.  Norms is of two types L1 and L2.

Common norm:

L2 Norm = √(x² + y² + z²)

7. Distance

Distance tells how far two vectors are.

Common types:

  • Euclidean Distance
  • Manhattan Distance

8. Similarity

Similarity defines how close two vectors are in meanings.

commonly used similarity:

Cosine Similarity

High similarity means the meanings are similar.

How Embedding Works

Embeddings are generated using an Embedding Matrix.

Step 1: Tokenization

Firstly, when model inputs any sentence then the text is converted into tokens:

sentence: I love AI 
token IDs: [12, 532, 981]

Step 2: Embedding Matrix

Each token ID maps to a vector:

50000 × 768 matrix

Step 3: Lookup

Each token is replaced with its vector representation.

Step 4: Training

Initially embeddings are random.

During training:

  • Backpropagation updates vectors
  • Similar words move closer
  • Meaningful structure is formed

Token Embeddings

Token embeddings are the core embeddings used in LLMs.

Each token is mapped to a dense vector that represents its meaning.

Example:

dog → [0.2, 0.8, ...]
cat → [0.21, 0.79, ...]
car → [-0.9, 0.1, ...]

Here:

  • dog and cat are close (similar meaning)
  • car is far (different meaning)

Positional Embeddings

Transformers do not understand order of words.

Example:

Dog bites man ≠ Man bites dog

So positional embeddings are added to give order information.

Final input:

Final Embedding = Token Embedding + Positional Embedding

Why Embeddings are Important in LLMs

Embeddings are the foundation of LLM intelligence because:

  • They convert text into mathematical space
  • They preserve semantic meaning
  • They allow similarity computation
  • They enable retrieval systems (RAG)
  • They form the input of Transformers

Real-World Applications

  • ChatGPT
  • Google Search
  • Recommendation systems
  • RAG-based chatbots
  • Vector databases
  • Semantic search engines
0 Comments Report