Re: Making Row Comparison NULL row member handling more robust during skip scans

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Making Row Comparison NULL row member handling more robust during skip scans
Date: 2025-06-20 00:44:35
Message-ID: CAH2-Wz=WFSX9+KURgxg2Wfj2weiugMn7dywRabnEsnnY-4+H_Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 18, 2025 at 8:41 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Attached patch shows how this could work. It refines the rules around
> NULL row comparison members in _bt_check_rowcompare, and in _bt_first.
> The fundamental idea here is to make _bt_check_rowcompare *consistent*
> with _bt_first. That way _bt_set_startikey doesn't have to know
> anything about row compares.

I was dissatisfied in _bt_check_rowcompare, where in v1 I check
whether the previous subkey/row compare member has a column that's
immediately before the current subkey/row compare (where the current
subkey/row compare is marked ISNULL, making it unsatisfiable).

Why was it ever necessary to look at the previous subkey like this?
Why can't we simply check if an unsatisfiable ISNULL subkey/row
compare is itself marked required in the current scan direction --
just like in every other case where we set continuescan=false? One
possible answer is this: we'll never mark a row compare member after
the first row compare member as required, no matter the details.
But...why not?

Attached is v2, which avoids the v1 business with checking the prior
row member's attribute number. In fact, it doesn't even access the
prior row member at all. All that matters is whether or not the
unsatisfiable ISNULL subkey/row compare is itself marked required in
the current scan direction (there's no more of that "--subkey"
business that you see on HEAD). This makes _bt_check_rowcompare
clearer, and even more in line with what code in places like
_bt_advance_array_keys expects.

This also has significant performance advantages (see later test
case). I now find myself in the awkward position of proposing a useful
performance enhancement well after feature freeze for Postgres 18. I
think that that is actually the best option, on balance. Since it
makes row comparisons so much easier to understand from a great
distance -- without the v1 "check the attribute number for a not
required but kinda still required row member key" kludge.

Relevant background information concerning _bt_mark_scankey_required,
the function that deals with marking row compare members required (why
it's okay to change the rules in _bt_mark_scankey_required like this):

There was a 10 year period (from 2006 - 2016) during which it was
possible for multiple/all row members to be marked required (in one
scan direction). This changed in bugfix commit a298a1e0, which dealt
with incorrect handling of NULL index entries in indexed ROW()
comparison. That bugfix simply taught _bt_mark_scankey_required to not
mark any row member beyond the first one as required. But that seems
like overkill to me. My v2 goes back to the old pre-2016
_bt_mark_scankey_required behavior, and fixes the same 2016 problem by
following a more principled approach -- that's what makes it safe to
use required-ness marking, instead of using the v1 "check the
attribute number for a not required but kinda still required row
member key" kludge.

It seems to me that the actual problem that 2016 commit a298a1e0 fixed
is best understood as a problem with the basic rules for stopping the
scan when we encounter a NULL index tuple value in the context of row
compares; once we do that, we should then be able to mark multiple row
members required, based on the old pre-a298a1e0 rules. In other words,
I think that it would have made more sense to blame the original 2016
bug report problem on "the fix for bug #6278" (which happened in
commit 6980f817 and follow-up commit 882368e8, both from 2011).

If I revert 2016 a298a1e0 (i.e. make _bt_mark_scankey_required mark
multiple row compare members required per the old rules), which is a
part of what I'm doing in this v2, I can easily reproduce the bug that
Tom was concerned about back in 2016. However, I find that it's
possible to fix that same bug in a more principled way: by further
refining the basic rules for stopping the scan when we encounter a
NULL (more or less a fixed version of the logic added in 2011 by those
2 commits). So v2 does that as well.

We only need to be slightly more careful on the second-or-subsequent
row member marked required when we encounter a NULL now: it's only
okay to terminate the scan when the row compare is marked required in
the current scan direction. It is not good enough for it to be marked
required in the opposite direction -- unless we're dealing with the
first row compare member (so there's no change in requiredness-marking
behavior when dealing with the first row member, compared to HEAD).

If you think about why it's *ever* okay to terminate the scan using a
key not marked required in the *current* scan direction (when dealing
with NULLs), you'll understand why it cannot safely be applied to
lower-order row compare members. That in itself doesn't make it
generally unsafe to treat lower-order row compare members as required
to continue the scan. After all, there weren't any problems with it
for many years -- until the rules with terminating the scan on NULL
tuple values changed/were optimized in 2011 or so.

Comments in v2 try to explain this as clearly as possible.

> I would like to commit this to Postgres 18, treating it as a bugfix.
> It seems like a much more solid fix than what I came up with in bugfix
> commit 5f4d98d4. Does anybody else have an opinion on that question?
>
> Note that this patch effectively *adds a new optimization* to
> _bt_first around row comparison keys with NULL row members. I don't
> actually care about the performance of such row comparisons.

v2 actually does add an optimization that people would probably find
valuable -- which is a logical consequence of where I've taken things
in v2. Improving performance still isn't really the goal, but I want
to be clear about the fact that this patch could reasonably also be
understood as an optimization (that was much less true with v1, since
that only changed things with ISNULL-marked scan keys, not with tuples
that contain NULLs).

Currently, on master/18, the following query scans an index with
plenty of NULLs in all columns:

select a, b, c
from fuzz_skip_scan
where (a, b, c) >= (11, 1, 71) and (a, b, c) <= (12, 1, 1)
order by a desc, b desc, c desc, d desc;

We get an index-only scan backwards, with 37 buffer hits. However, the
equivalent forwards scan does a lot worse:

select a, b, c
from fuzz_skip_scan
where (a, b, c) >= (11, 1, 71) and (a, b, c) <= (12, 1, 1)
order by a, b, c, d;

That gets an index-only scan forwards, with 69 buffer hits.

There's no need for this inconsistency. With v2 applied, we get it
down to 35 buffer hits for both variants. If it's safe to only read 37
or 35 pages in one direction, then why wouldn't it also be safe to get
about that same number of buffer hits when scanning in the opposite
direction?

This improved performance profile is a consequence of v2 restoring
_bt_mark_scankey_required to its pre-2016 state, adding back logic
that is symmetric with existing logic in _bt_first (logic that wasn't
removed in 2016). One could argue that v2 is just fixing a regression
in 2016 commit a298a1e0.

--
Peter Geoghegan

Attachment Content-Type Size
v2-0001-Simplify-and-improve-row-compare-NULL-handling.patch application/octet-stream 13.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2025-06-20 01:11:00 leafhopper / snakefly failing to build HEAD - GCC bug
Previous Message Alexander Korotkov 2025-06-20 00:24:16 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly