Re: Improving worst-case merge join performance with often-null foreign key

From: Steinar Kaldager <steinar(dot)kaldager(at)oda(dot)com>
To: Richard Guo <guofenglinux(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Improving worst-case merge join performance with often-null foreign key
Date: 2023-04-25 12:51:00
Message-ID: CANcDffeUuU5OezYPMPb4ypYFpxxWs5pgVwGGEAEe4hD1+1PP1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 23, 2023 at 11:30 AM Richard Guo <guofenglinux(at)gmail(dot)com> wrote:
> On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hmm. I don't entirely understand why the existing stop-at-nulls logic
>> in nodeMergejoin.c didn't fix this for you. Maybe somebody has broken
>> that? See the commentary for MJEvalOuterValues/MJEvalInnerValues.
>
>
> I think it's just because the MergeJoin didn't see a NULL foo_id value
> from test_bar tuples because all such tuples are removed by the filter
> 'test_bar.active', thus it does not have a chance to stop at nulls.

Agreed, this is also my understanding.

Note that this isn't just a contrived test case, it's also the
situation we ran into in prod. (We had a table with a lot of old
inactive rows with null values for Historical Reasons, essentially
kept for accounting/archival purposes. Newer, active, rows all had the
foreign key column set to non-null.)

I had initially considered whether this could be fixed in the
merge-join execution code instead of by altering the plan, but at
first glance that feels like it might be a more awkward fit. It's easy
enough to stop the merge join if a null actually appears, but because
of the filter, no null will ever appear. You'd have to somehow break
the "stream of values" abstraction and look at where the values are
actually coming from and/or which values would have appeared if they
weren't filtered out. I don't know the codebase well, but to me that
feels fairly hacky compared to altering the plan for the index scan.

Steinar

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-04-25 13:06:05 RE: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Daniel Gustafsson 2023-04-25 12:39:39 Re: Should vacuum process config file reload more often