Quick Links

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject:	Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date:	2024-03-18 13:24:46
Message-ID:	CAEze2Wi4U5TDPSsWA=kSsN7kcdV7wT-0UO-7kjY2EZSmeRFY9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sat, 16 Mar 2024 at 01:12, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Fri, Mar 8, 2024 at 9:00 AM Matthias van de Meent
> <boekewurm+postgres(at)gmail(dot)com> wrote:
> > I've attached v14, where 0001 is v13, 0002 is a patch with small
> > changes + some large comment suggestions, and 0003 which contains
> > sorted merge join code for _bt_merge_arrays.
>
> I have accepted your changes from 0003. Agree that it's better that
> way. It's at least a little faster, but not meaningfully more
> complicated.

Thanks.

> > I'll try to work a bit on v13/14's _bt_preprocess_keys, and see what I
> > can make of it.
>
> That's been the big focus of this new v15, which now goes all out with
> teaching _bt_preprocess_keys with how to deal with arrays. We now do
> comprehensive normalization of even very complicated combinations of
> arrays and non-array scan keys in this version.

I was thinking about a more unified processing model, where
_bt_preprocess_keys would iterate over all keys, including processing
of array keys (by merging and reduction to normal keys) if and when
found. This would also reduce the effort expended when there are
contradictory scan keys, as array preprocessing is relatively more
expensive than other scankeys and contradictions are then found before
processing of later keys.
As I wasn't very far into the work yet it seems I can reuse a lot of
your work here.

> For example, consider this query:
[...]
> This has a total of 6 input scankeys -- 3 of which are arrays. But by
> the time v15's _bt_preprocess_keys is done with it, it'll have only 1
> scan key -- which doesn't even have an array (not anymore). And so we
> won't even need to advance the array keys one single time -- there'll
> simply be no array left to advance. In other words, it'll be just as
> if the query was written this way from the start:
>
> select *
> from multi_test
> where
> a = 3::int;
>
> (Though of course the original query will spend more cycles on
> preprocessing, compared to this manually simplified variant.)

That's a good improvement, much closer to an optimal access pattern.

> It turned out to not be terribly difficult to teach
> _bt_preprocess_keys everything it could possibly need to know about
> arrays, so that it can operate on them directly, as a variant of the
> standard equality strategy (we do still need _bt_preprocess_array_keys
> for basic preprocessing of arrays, mostly just merging them). This is
> better overall (in that it gets every little subtlety right), but it
> also simplified a number of related issues. For example, there is no
> longer any need to maintain a mapping between so->keyData[]-wise scan
> keys (output scan keys), and scan->keyData[]-wise scan keys (input
> scan keys). We can just add a step to fix-up the references to the end
> of _bt_preprocess_keys, to make life easier within
> _bt_advance_array_keys.
>
> This preprocessing work should all be happening during planning, not
> during query execution -- that's the design that makes the most sense.
> This is something we've discussed in the past in the context of skip
> scan (see my original email to this thread for the reference).

Yes, but IIRC we also agreed that it's impossible to do this fully in
planning, amongst others due to joins on array fields.

> It
> would be especially useful for the very fancy kinds of preprocessing
> that are described by the MDAM paper, like using an index scan for a
> NOT IN() list/array (this can actually make sense with a low
> cardinality index).

Yes, indexes such as those on enums. Though, in those cases the NOT IN
() could be transformed into IN()-lists by the planner, but not the
index.

> The structure for preprocessing that I'm working towards (especially
> in v15) sets the groundwork for making those shifts in the planner,
> because we'll no longer treat each array constant as its own primitive
> index scan during preprocessing.

I hope that's going to be a fully separate patch. I don't think I can
handle much more complexity in this one.

> Right now, on HEAD, preprocessing
> with arrays kinda treats each array constant like the parameter of an
> imaginary inner index scan, from an imaginary nested loop join. But
> the planner really needs to operate on the whole qual, all at once,
> including any arrays. An actual nestloop join's inner index scan
> naturally cannot do that, and so might still require runtime/dynamic
> preprocessing in a world where that mostly happens in the planner --
> but that clearly not appropriate for arrays ("WHERE a = 5" and "WHERE
> a in(4, 5)" are almost the same thing, and so should be handled in
> almost the same way by preprocessing).

Yeah, if the planner could handle some of this that'd be great. At the
same time, I think that this might need to be gated behind a guc for
more expensive planner-time deductions.

> > > + * We generally permit primitive index scans to continue onto the next
> > > + * sibling page when the page's finaltup satisfies all required scan keys
> > > + * at the point where we're between pages.
> >
> > This should probably describe that we permit primitive scans with
> > array keys to continue until we get to the sibling page, rather than
> > this rather obvious and generic statement that would cover even the
> > index scan for id > 0 ORDER BY id asc; or this paragraph can be
> > removed.
>
> It's not quite obvious.
[...]
> Separately, there is also the potential for undesirable interactions
> between 1 and 2, which is why we don't let them mix. (We have the "if
> (so->scanBehind && has_required_opposite_direction_only) goto
> new_prim_scan" gating condition.)

I see.

> > Further notes:
> >
> > I have yet to fully grasp what so->scanBehind is supposed to mean. "/*
> > Scan might be behind arrays? */" doesn't give me enough hints here.
>
> Yes, it is complicated. The best explanation is the one I've added to
> _bt_readpage, next to the precheck. But that does need more work.

Yeah. The _bt_readpage comment doesn't actually contain the search
term scanBehind, so I wasn't expecting that to be documented there.

> > I find it weird that we call _bt_advance_array_keys for non-required
> > sktrig. Shouldn't that be as easy as doing a binary search through the
> > array? Why does this need to hit the difficult path?
>
> What difficult path?

"Expensive" would probably have been a better wording: we do a
comparative lot of processing in the !_bt_check_compare() +
!continuescan path; much more than the binary searches you'd need for
non-required array key checks.

> Advancing non-required arrays isn't that different to advancing
> required ones. We will never advance required arrays when called just
> to advance a non-required one, obviously. But we can do the opposite.
> In fact, we absolutely have to advance non-required arrays (as best we
> can) when advancing a required one (or when the call was triggered by
> a non-array required scan key).

I think it's a lot more expensive to do the non-required array key
increment for non-required triggers. What are we protecting against
(or improving) by always doing advance_array_keys on non-required
trigger keys?

I mean that we should just do the non-required array key binary search
inside _bt_check_compare for non-required array keys, as that would
skip a lot of the rather expensive other array key infrastructure, and
only if we're outside the minimum or maximum bounds of the
non-required scankeys should we trigger advance_array_keys (unless
scan direction changed).

A full review of the updated patch will follow soon.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan at 2024-03-16 00:12:02 from Peter Geoghegan

Responses

Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan at 2024-03-21 23:21:53 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2024-03-18 13:25:33	Re: Regression tests fail with musl libc because libpq.so can't be loaded
Previous Message	Dagfinn Ilmari Mannsåker	2024-03-18 13:18:43	Re: What about Perl autodie?