From: | Julien Tachoires <julien(at)tachoires(dot)me> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Qual push down to table AM |
Date: | 2025-08-29 08:38:32 |
Message-ID: | 20250829083832.f2clwltkdzsfd2ns@poseidon.home.virt |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Wed, Aug 27, 2025 at 05:50:01PM -0400, Andres Freund wrote:
> On 2025-08-27 22:27:37 +0200, Julien Tachoires wrote:
> > Please find attached a patch set proposal intended to implement WHERE
> > clauses (qual) push down to the underlying table AM during
> > table/sequential scan execution.
> >
> > The primary goal of this project is to convert quals to ScanKeys and
> > pass them to the table AMs. Table AMs are then allowed to apply early
> > tuple filtering during table (sequential) scans. Applying filtering at
> > the table storage level is something necessary for non row-oriented
> > table storage like columnar storage. Index organized table is another
> > table storage that would need quals push down.
> >
> > AFAIK, CustomScan is the one and only way to go for having table scan
> > using quals pushed down, but each table AM must implement its own
> > mechanism. IMHO, having this feature available in core would help the
> > development of new table AMs. About Heap, some performance testing
> > (detailed at the end of this message) shows between 45% and 60%
> > improvement in seq scan execution time when only one tuple is returned
> > from the table.
>
> One problem with doing that in the case of heapam is that you're evaluating
> scan keys with the buffer lock held - with basically arbitrary expressions
> being evaluated. That's an easy path to undetected deadlocks. You'd have to
> redesign the relevant mechanism to filter outside of the lock...
Thank you for this quick feedback.
One potential approach to solve this in heapgettup() would be:
1. hold the buffer lock
2. get the tuple from the buffer
3. if the tuple is not visible, move to the next tuple, back to 2.
4. release the buffer lock
5. if the tuple does not satisfy the scan keys, take the buffer lock,
move to the next tuple, back to 2.
6. return the tuple
Do you see something fundamentally wrong here?
In practice, I might be wrong, but I think this problem affects
heapgettup() only, heapgettup_pagemode() does not hold the buffer lock
when HeapKeyTest() is called. To reach this problematic part we need two
conditions: non-NULL ScanKey array and not to be in the page-at-a-time
mode (pagemode). The only table_beginscan_something() function able to
meet these conditions before calling the scan_begin() API is
table_beginscan_sampling(): the caller can choose to use pagemode. The
only place where table_beginscan_sampling() is called is in
tablesample_init(), in this case the ScanKey array is NULL, so we cannot
reach the problematic part of heapgettup(). There is only one other case
where we disable the pagemode in heapam: when the snapshot is non-MVCC.
Do you know any other code path/scenario I missed that can lead to reach
this problematic part?
Best regards,
--
Julien Tachoires
From | Date | Subject | |
---|---|---|---|
Next Message | Matthias van de Meent | 2025-08-29 08:38:39 | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Previous Message | Peter Eisentraut | 2025-08-29 08:34:17 | Re: fixing tsearch locale support |