Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: BRIN indexes vs. SK_SEARCHARRAY (and preprocessing scan keys)
Date: 2023-07-08 21:57:14
Message-ID: 2484032b-3e6d-ae6a-61e5-732e2136ad18@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/07/2023 19:09, Tomas Vondra wrote:
> Here's an updated version of the patch series.
>
> I've polished and pushed the first three patches with cleanup, tests to
> improve test coverage and so on. I chose not to backpatch those - I
> planned to do that to make future backpatches simpler, but the changes
> ended up less disruptive than expected.
>
> The remaining patches are just about adding SK_SEARCHARRAY to BRIN.
>
> 0001 - adds the optional preprocess procedure, calls it from brinrescan
>
> 0002 to 0005 - adds the support to the existing BRIN opclasses

Could you implement this completely in the consistent-function, by
caching the sorted array in fn_extra, without adding the new preprocess
procedure? On first call, when fn_extra == NULL, sort the array and
stash it in fn_extra.

I don't think that works, because fn_extra isn't reset when the scan
keys change on rescan. We could reset it, and document that you can use
fn_extra for per-scankey caching. There's some precedence for not
resetting it though, see commit d22a09dc70f. But we could provide an
opaque per-scankey scratch space like that somewhere else. In BrinDesc,
perhaps.

The new preprocess support function feels a bit too inflexible to me.
True, you can store whatever you want in the ScanKey that it returns,
but since that's the case, why not just make it void * ?It seems that
the constraint here was that you need to pass a ScanKey to the
consistent function, because the consistent function's signature is what
it is. But we can change the signature, if we introduce a new support
amproc number for it.

> The main open question I have is what exactly does it mean that the
> procedure is optional. In particular, should it be supported to have a
> BRIN opclass without the "preprocess" procedure but using the other
> built-in support procedures?
>
> For example, imagine you have a custom BRIN opclass in an extension (for
> a custom data type or something). This does not need to implement any
> procedures, it can just call the existing built-in ones. Of course, this
> won't get the "preprocess" procedure automatically.
>
> Should we support such opclasses or should we force the extension to be
> updated by adding a preprocess procedure? I'd say "optional" means we
> should support (otherwise it'd not really optional).
>
> The reason why this matters is that "amsearcharray" is AM-level flag,
> but the support procedure is defined by the opclass. So the consistent
> function needs to handle SK_SEARCHARRAY keys both with and without
> preprocessing.
>
> That's mostly what I did for all existing BRIN opclasses (it's a bit
> confusing that opclass may refer to both the "generic" minmax or the
> opclass defined for a particular data type). All the opclasses now
> handle three cases:
>
> 1) scalar keys (just like before, with amsearcharray=fase)
>
> 2) array keys with preprocessing (sorted array, array of hashes, ...)
>
> 3) array keys without preprocessing (for compatibility with old
> opclasses missing the optional preprocess procedure)
>
> The current code is a bit ugly, because it duplicates a bunch of code,
> because the option (3) mostly does (1) in a loop. I'm confident this can
> be reduced by refactoring and reusing some of the "shared" code.
>
> The question is if my interpretation of what "optional" procedure means
> is reasonable. Thoughts?
>
> The other thing is how to test this "compatibility" code. I assume we
> want to have the procedure for all built-in opclasses, so that won't
> exercise it. I did test it by temporarily removing the procedure from a
> couple pg_amproc.dat entries. I guess creating a custom opclass in the
> regression test is the right solution.

It would be unpleasant to force all BRIN opclasses to immediately
implement the searcharray-logic. If we don't want to do that, we need to
implement generic SK_SEARCHARRAY handling in BRIN AM itself. That would
be pretty easy, right? Just call the regular consistent function for
every element in the array.

If an opclass wants to provide a faster/better implementation, it can
provide a new consistent support procedure that supports that. Let's
assign a new amproc number for new-style consistent function, which must
support SK_SEARCHARRAY, and pass it some scratch space where it can
cache whatever per-scankey data. Because it gets a new amproc number, we
can define the arguments as we wish. We can pass a pointer to the
per-scankey scratch space as a new argument, for example.

We did this backwards-compatibility dance with the 3/4-argument variants
of the current consistent functions. But I think assigning a whole new
procedure number is better than looking at the number of arguments.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-07-08 22:09:04 Re: Preventing non-superusers from altering session authorization
Previous Message Nathan Bossart 2023-07-08 21:43:54 Re: Reducing connection overhead in pg_upgrade compat check phase