Understanding Quantization and ollama model storage

Post 1 month ago - 06 Jun 2026 | Updated 08 Jun 2026 | 303

What is Quantization?

Quantization is a technique which is used to reduce the size of AI models such that the AI model can also run on normal computers or laptops. In simple words we can say that Quantization is process to make AI model less in size and lightweight.

In AI models there are billions of parameters and numerical values stored which is normally stored in high precision which causes the large size of AI model which requires more RAM and powerful hardware to run. Quantization reduces the precision of the values which reduces the size of the AI model and also reduces the requirement of hardware.

Example:

suppose any AI model stores the value like this:

0.987654321

After Quantization the value is change as something like this:

0.99

It caused some precision loss but the storage an memory usage is significantly reduced. That's why Quantization model use less storage, consumes less RAM, Load faster and also run on normal laptops.

Why do we need Quantization?

Original AI models are very large in size and to run these AI model highly powerful hardware are required.

Example:

    Original Model
          ↓
Large Storage Requirement
          ↓
    More RAM Usage
          ↓
 High Hardware Requirement

These models can not run in each system. So to solve these problem Quantization is used.

Common Quantization Types

When you download ollama oy any AI model then you may introduce by some terms like:

Q2
Q3
Q4
Q5
Q6
Q8

this represents the Quantization level of the model. In simple words It tells about the how the precision value is stored of any model. Generally, Quantization level defines the size of model but differ can be visible in terms of accuracy and response.

Q2 Quantization

Its an very aggressive quantization, these are the very small in size models, They consumes the minimum RAM, There can be noticeable reduction in Accuracy.

Q2 Quantization is Suitable for:

Very low-end systems
Testing purposes

Q4 Quantization

This is the one of the popular Quantization level. It maintains good balance between quality and performance, They consumes less RAM and faster inference. Many ollama models are available in Q4 in default.

Q4 Quantization is Suitable for:

Most beginners
Normal laptops
Everyday usage

Q5 Quantization

It provides the balance between Q4 and Q8 Quantization. It provides better accuracy than Q4, usage manageable hardware requirements.

Q5 Quantization is Suitable for:

Users who want slightly better quality
Mid-range systems

Q6 Quantization

Q6 provides higher precision, Better response quality but it consumes more RAM in compare of Q4 and Q5.

Q6 Quantization is Suitable for:

Advanced users
Systems with sufficient RAM

Q8 Quantization

It is one of the highest commonly used quantization levels. It provides Better response quality, more accurate outputs and these are closer to the original model. but these are large size model and it requires higher RAM than other quantization models.

Q8 Quantization is Suitable for:

Powerful systems
Users who prioritize quality over speed

Comparison Table

Quantization	Model Size	RAM Usage	Speed	Accuracy
Q2	Very Small	Very Low	Very Fast	Low
Q4	Small	Low	Fast	Good
Q5	Medium	Medium	Good	Better
Q6	Larger	Higher	Moderate	Very Good
Q8	Largest	Highest	Slower	Best

Understanding Ollama Model storage

now we have to understand where the model originally gets stored when we download the model. when we run the command

ollama pull llama3

then ollama download the model from internet and stores in local system.

Where Does Ollama Store Models?

Windows

C:\Users\<username>\.ollama\models

Linux

~/.ollama/models

macOS

~/.ollama/models

these location are ollama storage folder.

Understanding the Folder Structure

Models folder is consist my mainly two important folders which are:

models
│
├── blobs
├── manifests

What is the Blobs Folder?

Blobs folder stores the actual data of the model. This folder is consist of AI model files. These are the normal model data files.

What is the Manifest Folder?

This folder stores the information of the model like model name, version, configuration, tags, etc.

Understanding Storage is important because it can full the storage of c drive when multiple highly weighted model are load in any system.