Re: GiST limits on contrib/cube with dimension > 100?

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Siarhei Siniak <siarheisiniak(at)yahoo(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST limits on contrib/cube with dimension > 100?
Date: 2019-06-12 15:14:34
Message-ID: B360AFBA-A8A9-46AB-9DA6-A94BF9FDB989@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> 12 июня 2019 г., в 15:11, Siarhei Siniak <siarheisiniak(at)yahoo(dot)com> написал(а):
>
> A uniform set of points with a dimension 128 and type cube. That has a size of 50 ** 3. Can be queried for a nearest neighbor at a speed of 10 queries per second with limit varying from 1 to 25.
> It works better than when no index used at all. So gist gives here a speed up.

Then, I think, data is correlated. I believe it is possible to implement something better for high dimensional KNN in GiST than cube.

>
> The documentation of postgresql doesn't mention complexity of such an index. I've got confused as to its speed.
>
> Does postgresql allow for an approximate nearest neighbor search?
> https://github.com/erikbern/ann-benchmarks has a lot of efficient implementations.

ANN is beyond concepts of SQL standard: database with index must return same results as without index.
I can add ANN to github.com/x4m/ags which is a fork of GiST.

But PostgreSQL adds a lot of OLTP overhead:
1. it is crash-safe
2. it supports concurrent operations
2a. in a very unexpected way, for example in serializable mode it guaranties you that if you were looking for nearest neighbor there will not appear any new closer neighbor until the end of your transaction.
3. it allows extensibility and has abstraction layers
4. it has declarative querying language
etcetc

All this comes at a cost of database that can do anything and be anything. It its very hard to compete with specialized indexes that only do ANN.

Yet, as far as I know, no one really pursued the idea of fast high dimensional ANN in PG.

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2019-06-12 15:20:59 RE: proposal: pg_restore --convert-to-text
Previous Message Oleksii Kliukin 2019-06-12 15:14:13 Re: upgrades in row-level locks can deadlock