Re: CUBE_MAX_DIM

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Devrim Gündüz <devrim(at)gunduz(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: CUBE_MAX_DIM
Date: 2020-06-25 13:31:36
Message-ID: CAC8Q8tLA8jO5nj20CqUXe+m4HqSrBLoRS2aFsuQKdXpmhJh5OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

The problem with higher dimension cubes is that starting with
dimensionality of ~52 the "distance" metrics in 64-bit float have less than
a single bit per dimension in mantissa, making cubes indistinguishable.
Developers for facial recognition software had a chat about that on russian
postgres telegram group https://t.me/pgsql. Their problem was that they had
128-dimensional points, recompiled postgres - distances weren't helpful,
and GIST KNN severely degraded to almost full scans. They had to change the
number of facial features to smaller in order to make KNN search work.

Floating point overflow isn't that much of a risk per se, worst
case scenario it becomes an Infinity or 0 which are usually acceptable in
those contexts.

While mathematically possible, there are implementation issues with higher
dimension cubes. I'm ok with raising the limit if such nuances get a
mention in docs.

On Thu, Jun 25, 2020 at 1:01 PM Devrim Gündüz <devrim(at)gunduz(dot)org> wrote:

>
> Hi,
>
> Someone contacted me about increasing CUBE_MAX_DIM
> in contrib/cube/cubedata.h (in the community RPMs). The current value
> is 100 with the following comment:
>
> * This limit is pretty arbitrary, but don't make it so large that you
> * risk overflow in sizing calculations.
>
>
> They said they use 500, and never had a problem. I never added such
> patches to the RPMS, and will not -- but wanted to ask if we can safely
> increase it in upstream?
>
> Regards,
>
> --
> Devrim Gündüz
> Open Source Solution Architect, Red Hat Certified Engineer
> Twitter: @DevrimGunduz , @DevrimGunduzTR
>

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-06-25 13:40:45 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message Inoue, Hiroshi 2020-06-25 13:14:00 Re: Removal of currtid()/currtid2() and some table AM cleanup