Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Date: 2024-01-14 23:15:42
Message-ID: 4d910260-f22c-4d95-a40d-c177fcc9419c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/14/24 12:18, vignesh C wrote:
> On Fri, 14 Jul 2023 at 20:17, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 7/9/23 23:44, Tomas Vondra wrote:
>>> ...
>>>>> Yes, my previous message was mostly about backwards compatibility, and
>>>>> this may seem a bit like an argument against it. But that message was
>>>>> more a question "If we do this, is it actually backwards compatible the
>>>>> way we want/need?")
>>>>>
>>>>> Anyway, I think the BrinDesc scratch space is a neat idea, I'll try
>>>>> doing it that way and report back in a couple days.
>>>>
>>>> Cool. In 0005-Support-SK_SEARCHARRAY-in-BRIN-bloom-20230702.patch, you
>>>> used the preprocess function to pre-calculate the scankey's hash, even
>>>> for scalars. You could use the scratch space in BrinDesc for that,
>>>> before doing anything with SEARCHARRAYs.
>>>>
>>>
>>> Yeah, that's a good idea.
>>>
>>
>> I started looking at this (the scratch space in BrinDesc), and it's not
>> as straightforward. The trouble is BrinDesc is "per attribute" but the
>> scratch space is "per scankey" (because we'd like to sort values from
>> the scankey array).
>>
>> With the "new" consistent functions (that get all scan keys at once)
>> this probably is not an issue, because we know which scan key we're
>> processing and so we can map it to the scratch space. But with the old
>> consistent function that's not the case. Maybe we should support this
>> only with the "new" consistent function variant?
>>
>> This would however conflict with the idea to have a separate consistent
>> function for arrays, which "splits" the scankeys into multiple groups
>> again. There could be multiple SAOP scan keys, and then what?
>>
>> I wonder if the scratch space should be in the ScanKey instead?
>
> Are we planning to post an updated patch for this? If the interest has
> gone down and if there are no plans to handle this I'm thinking of
> returning this commitfest entry in this commitfest and can be opened
> when there is more interest.
>

I still think the patch is a good idea and plan to get back to it, but
probably not in this CF. Given that the last update if from July, it's
fair to bump it - either RWF or just move to the next CF. Up to you.

As for the patch, I wonder if Heikki has some idea what to do about the
scratch space? I got stuck on thinking about how to do this with the two
types of consistent functions we support/allow.

To articulate the issue more clearly - the scratch space is "per index"
but we need scratch space "per index key". That's fine - we can simply
have pointers to multiple scratch spaces, I think.

But we have two ways to do consistent functions - the "old" gets scan
keys one attribute at a time, "new" gets all at once. For the "new" it's
not a problem, it's simple to identify the right scratch space. But for
the "old" one, how would that happen? The consistent function has no
idea which index key it's operating on, and how to identify the correct
scratch space.

I can think of two ways to deal with this:

1) Only allow SK_SEARCHARRAY for indexes supporting new-style consistent
functions (but I'm not sure how, considering amsearcharray is set way
before we know what the opclass does, or whether it implements the old
or new consistent function).

2) Allow SK_SEARCHARRAY even with old consistent function, but do some
dance in bringetbitmap() so to set the scratch space accordingly before
the call.

Now that I read Heikki's messages again, I see he suggested assigning a
new procnum to a consistent function supporting SK_SEARCHARRAY, which
seems to be very close to (1). Except that we'd now have 3 ways to
define a consistent function, and that sounds a bit ... too much?

Anyway, thinking about (1), I'm still not sure what to do about existing
opclasses - it'd be nice to have some backwards compatible solution,
without breaking everything and forcing everyone to implement all the
new stuff. Which is kinda why we already have two ways to do consistent
functions. Presumably we'd have to implement some "default" handling by
translating the SK_SEARCHARRAY key into simple equality keys ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maciek Sakrejda 2024-01-14 23:20:48 Re: Printing backtrace of postgres processes
Previous Message Tomas Vondra 2024-01-14 21:47:45 Re: Custom explain options