Introduction
when we ask any question from any AI model, Model do not understand the words directly, words for any computer is just text. To understand the meaning of the text model use to convert these text into numbers. This work is don by Embeddings. In simple words Embedding are numerical representation of text that helps model to understand its meaning and context.
Why do we need Embeddings?
Suppose you have an AI chatbot and any user ask a question “ How can I reset my password? ” but suppose in company document it is mentioned as “ Steps to change your account credentials ”. Here according to human's perspective both statements have same meanings, but for computer Password is not equals to Credential, and Reset is not equals to change. Model will treat the both sentences differently, which is the main problem.
What is an Embedding?
Embedding is a technique by using which is used to represent the text into numbers such that computer can understand the text easily. In simple words Embedding is a numerical representation of text that helps AI to understand the meaning and relationship between words, sentences and documents. Simply embedding is a vector.
Example:
Laptop
↓
[0.21, 0.84, -0.56, 0.34, ...]
or
Artificial Intelligence
↓
[0.72, -0.13, 0.91, 0.22, ...]
This numbers are not random there are some meaning encoded in the numbers.
How Embeddings Work?
Step 1: Input Text
What is AI?
suppose this text is given to the embedding model. At this stage it is just a normal human language which can not be understand by the computer daily.
Step 2: Embedding Model
Now the embedding model will process the text. Firstly the model will break the text into small parts.
Example:
What
is
AI
?
Now the model will analyze these tokens and during the training process it helps to learn the pattern and tries to understand the meaning of the context. Model not only see the words, other than this it also try to analyze the context and relationship.
example:
“Artificial Intelligence” and “Machine Learning” are to different words, but model knowns these two are related concept.
Step 3: Generate Vector
After processing the text embedding model generates the vector with is generally use for training.
Example:
Input:
What is AI?
Output:
[0.24, 0.67, -0.11, 0.93, ...]
The collection of these numbers are called Embedding Vector.
Flow:
Text → Embedding Model → Vector Representation
Understanding Similarity
suppose any words like “Laptop”, “Computer” and “Notebook” here the embedding of these words will be closer to each other. But if take two words “laptop” and “Banana” there embedding will be different.
Embeddings in Ollama
Ollama provides some model by using which we can generate embeddings.
Example:
ollama pull nomic-embed-text
Generate embedding:
ollama embed nomic-embed-text "What is Artificial Intelligence?"
Output:
[0.21, 0.34, -0.91, 0.72, ...]
These are the vector representation of the text.
read more about ollama:
Previous topic:
building technical customer support model using .net
next topic:
How Ollama Generates Responses