Different Types of Vectors

Any data, including human-readable text, can be represented as vectors to enable machine understanding. To process text, it must be converted into a machine-readable format, typically using dense or sparse vectors. This section explores these vector types, their applications in text processing, and their significance in Retrieval-Augmented Generation (RAG) systems.

1. Dense Vectors

Dense vectors are used when the data has no inherent sparsity. In data contexts, it describes representations that are closely packed, where most of the elements contain meaningful or non-zero values, indicating an informative dataset. Dense vectors are used for comparing the semantics or meaning of the text.

1.1 Dense Vector Embedding Example Code

1.2 Dense Vector Embedding Result

2. Sparse Vectors

SSparse vectors are used when the data being represented is inherently sparse. In data contexts, it refers to a situation where data is scattered, with most elements being zero or missing, resulting in a representation that contains a lot of empty or non-informative values. Sparse vectors are used for comparing the words and their meanings in text and are good at capturing the semantics of these texts.

For example, given below two sentences, sparse vectors would generate a perfect match (or near-perfect match), because they both contain exact same words or closely match even though they both have very different meaning.

The kangaroo sprinted from the bush in dawn towards the highway .

In dawn the kangaroo sprinted from the highway towards the bush .

2.1 Sparse Vector Embedding Example Code

2.2 Sparse Vector Embedding Result

◁ Previous Next ▷