Quick Links

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans

From:	Jacob Champion <jchampion(at)timescale(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject:	Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans
Date:	2023-07-19 23:44:31
Message-ID:	CAAWbhmjmjqgmaHHcu5rmx3sgdXBJT_t5bt1zW8rjMyeis=5mtg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Feb 27, 2023 at 12:24 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 22/02/2023 15:03, Aleksander Alekseev wrote:
> > If memory serves I noticed that WHERE ... IS NULL queries don't even
> > hit HeapKeyTest() and I was curious where the check for NULLs is
> > actually made. As I understand, SeqNext() in nodeSeqscan.c simply
> > iterates over all the tuples it can find and pushes them to the parent
> > node. We could get a slightly better performance for certain queries
> > if SeqNext() did the check internally.
>
> Right, it might be faster to perform the NULL-checks before checking
> visibility, for example. Arbitrary quals cannot be evaluated before
> checking visibility, but NULL checks could be.

Hi Heikki,

There's quite a bit of work left to do, but I wanted to check if the
attached patch (0002, based on top of Aleks' 0001 from upthread) was
going in the direction you were thinking. This patch pushes down any
forced-null and not-null Vars as ScanKeys. It doesn't remove the
redundant quals after turning them into ScanKeys, so it's needlessly
inefficient, but there's still a decent speedup for some of the basic
benchmarks in 0003.

Plans look something like this:

# EXPLAIN SELECT * FROM t WHERE i IS NULL;
QUERY PLAN
------------------------------------------------------------
Seq Scan on t (cost=0.00..1393.00 rows=49530 width=4)
Scan Cond: (i IS NULL)
Filter: (i IS NULL)
(3 rows)

# EXPLAIN SELECT * FROM t WHERE i = 3;
QUERY PLAN
--------------------------------------------------------
Seq Scan on t (cost=0.00..1643.00 rows=1 width=4)
Scan Cond: (i IS NOT NULL)
Filter: (i = 3)
(3 rows)

The non-nullable case worries me a bit because so many things imply IS
NOT NULL. I think I need to do some sort of cost analysis using the
null_frac statistics -- it probably only makes sense to push an
implicit SK_SEARCHNOTNULL down to the AM layer if some fraction of
rows would actually be filtered out -- but I'm not really sure how to
choose a threshold.

It would also be neat if `COUNT(col)` could push down
SK_SEARCHNOTNULL, but I think that would require a new support
function to rewrite the plan for an aggregate.

Am I on the right track?

Thanks,
--Jacob

Attachment	Content-Type	Size
0003-WIP-naive-benchmarks.patch	text/x-patch	3.1 KB
0001-Support-SK_SEARCHNULL-SK_SEARCHNOTNULL-for-heap-only.patch	text/x-patch	10.2 KB
0002-WIP-create-ScanKeys-from-derived-null-tests.patch	text/x-patch	8.7 KB

In response to

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans at 2023-02-27 08:24:21 from Heikki Linnakangas

Responses

Re: [PATCH] Support SK_SEARCHNULL / SK_SEARCHNOTNULL for heap-only scans at 2023-08-30 20:55:24 from Jacob Champion

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2023-07-19 23:56:25	Re: Performance degradation on concurrent COPY into a single relation in PG16.
Previous Message	Michael Paquier	2023-07-19 23:34:05	Re: Support to define custom wait events for extensions