What are embeddings, and why are they important in semantic search?
What are embeddings, and why are they important in semantic search?
1 Answer
Embedding
embedding is the understanding system of any
Large Language Models (LLMs). Transformers do not understand the Token IDs for transformers token IDs are just any random numbers.
Embedding is a dense numerical vector which represents the meaning of any tokens. Embedding layer is a giant lookup table known as embedding matrix which defines the vector values of the token. The words having similar meaning gets the same positions.
Embedding matrix:- Embedding matrix is a giant table to represent vectors from the token.
example:-
Vocabulary - 50000 tokens
embedding size - 4096
matrix = 50000 * 4096
where each row represents one token
Embedding contains:-
- Vector representation
- Vector Length
- Vector Distance
- Vector Similarity
- Vector Relationship
- Sematic Search
Why Embedding is important in semantic search?
Embedding is important for semantic search because it converts meaning of text into numerical vectors. Instead of matching exact keywords semantic search focuses on understanding actual meaning. Because of embedding, system becomes enable to identify the queries and documents having similar meanings, which results with the more accurate and relevant search results.
example:-
Query: How can I learn AI?
Document: Guide to studying Artificial Intelligence
here,
studying is similar to learn and Artificial Intelligence is similar to AI.
It is possible that may be keyword search fails here but embedding convert the both sentence into similar vectors such that semantic search retrieve the correct documents.