Quick Links

Re: vector search support

From:	"Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To:	Oliver Rice <oliver(at)oliverrice(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, mail(at)joeconway(dot)com
Subject:	Re: vector search support
Date:	2023-05-26 14:32:18
Message-ID:	e083ced8-83a0-9b73-156b-da968b83ac9c@postgresql.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 5/25/23 1:48 PM, Oliver Rice wrote:

> A nice side effect of using the float8[] to represent vectors is that it
> allows for vectors of different sizes to coexist in the same column.
>
> We most frequently see (pgvector) vector columns being used for storing
> ML embeddings. Given that different models produce embeddings with a
> different number of dimensions, the need to specify a vector’s size in
> DDL tightly couples the schema to a single model. Support for variable
> length vectors would be a great way to decouple those concepts. It would
> also be a differentiating feature from existing vector stores.

I hadn't thought of that, given most of what I've seen (or at least my
personal bias in designing systems) is you keep a vector of one
dimensionality in a column. But this sounds like where having native
support in a variable array would help.

> One drawback is that variable length vectors complicates indexing for
> similarity search because similarity measures require vectors of
> consistent length. Partial indexes are a possible solution to that challenge

Yeah, that presents a challenge. This may also be an argument for a
vector data type, since that would eliminate the need to check for
consistent dimensionality on the indexing.

Jonathan

In response to

Re: vector search support at 2023-05-25 17:48:02 from Oliver Rice

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jonathan S. Katz	2023-05-26 14:37:57	Re: vector search support
Previous Message	Jonathan S. Katz	2023-05-26 14:24:21	Re: vector search support