Re: vector search support

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Giuseppe Broccolo <g(dot)broccolo(dot)7(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, mail(at)joeconway(dot)com
Subject: Re: vector search support
Date: 2023-05-26 14:37:57
Message-ID: 49c7ba52-818a-6d0b-b8fd-eadef8e195a1@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/26/23 9:31 AM, Giuseppe Broccolo wrote:
> Hi Nathan,
>
> I find the patches really interesting. Personally, as Data/MLOps
> Engineer, I'm involved in a project where we use embedding techniques to
> generate vectors from documents, and use clustering and kNN searches to
> find similar documents basing on spatial neighbourhood of generated
> vectors.

Thanks! This seems to be a pretty common use-case these days.

> We finally opted for ElasticSearch as search engine, considering that it
> was providing what we needed:
>
> * support to store dense vectors
> * support for kNN searches (last version of ElasticSearch allows this)

I do want to note that we can implement indexing techniques with GiST
that perform K-NN searches with the "distance" support function[1], so
adding the fundamental functions to help with this around known vector
search techniques could add this functionality. We already have this
today with "cube", but as Nathan mentioned, it's limited to 100 dims.

> An internal benchmark showed us that we were able to achieve the
> expected performance, although we are still lacking some points:
>
> * clustering of vectors (this has to be done outside the search engine,
> using DBScan for our use case)

From your experience, have you found any particular clustering
algorithms better at driving a good performance/recall tradeoff?

> * concurrency in updating the ElasticSearch indexes storing the dense
> vectors

I do think concurrent updates of vector-based indexes is one area
PostgreSQL can ultimately be pretty good at, whether in core or in an
extension.

> I found these patches really interesting, considering that they would
> solve some of open issues when storing dense vectors. Index support
> would help a lot with searches though.

Great -- thanks for the feedback,

Jonathan

[1] https://www.postgresql.org/docs/devel/gist-extensibility.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kaiting Chen 2023-05-26 15:04:08 Is NEW.ctid usable as table_tuple_satisfies_snapshot?
Previous Message Jonathan S. Katz 2023-05-26 14:32:18 Re: vector search support