Re: WIP: BRIN multi-range indexes

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP: BRIN multi-range indexes
Date: 2021-01-26 18:52:53
Message-ID: CAFBsxsFudhzy1gUMp6fyj7xDXqZf5VPGC3krqsz42_0QGwcBBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 22, 2021 at 10:59 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
wrote:
>
>
> On 1/23/21 12:27 AM, John Naylor wrote:

> > Still, it would be great if multi-minmax can be a drop in replacement. I
> > know there was a sticking point of a distance function not being
> > available on all types, but I wonder if that can be remedied or worked
> > around somehow.
> >
>
> Hmm. I think Alvaro also mentioned he'd like to use this as a drop-in
> replacement for minmax (essentially, using these opclasses as the
> default ones, with the option to switch back to plain minmax). I'm not
> convinced we should do that - though. Imagine you have minmax indexes in
> your existing DB, it's working perfectly fine, and then we come and just
> silently change that during dump/restore. Is there some past example
> when we did something similar and it turned it to be OK?

I was assuming pg_dump can be taught to insert explicit opclasses for
minmax indexes, so that upgrade would not cause surprises. If that's true,
only new indexes would have the different default opclass.

> As for the distance functions, I'm pretty sure there are data types
> without "natural" distance - like most strings, for example. We could
> probably invent something, but the question is how much we can rely on
> it working well enough in practice.
>
> Of course, is minmax even the right index type for such data types?
> Strings are usually "labels" and not queried using range queries,
> although sometimes people encode stuff as strings (but then it's very
> unlikely we'll define the distance definition well). So maybe for those
> types a hash / bloom would be a better fit anyway.

Right.

> But I do have an idea - maybe we can do without distances, in those
> cases. Essentially, the primary issue of minmax indexes are outliers, so
> what if we simply sort the values, keep one range in the middle and as
> many single points on each tail?

That's an interesting idea. I think it would be a nice bonus to try to do
something along these lines. On the other hand, I'm not the one
volunteering to do the work, and the patch is useful as is.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Finnerty, Jim 2021-01-26 19:06:57 Re: Challenges preventing us moving to 64 bit transaction id (XID)?
Previous Message Jacob Champion 2021-01-26 18:43:03 Re: Allow matching whole DN from a client certificate