Re: Qual push down to table AM

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Julien Tachoires <julien(at)tachoires(dot)me>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Qual push down to table AM
Date: 2025-12-09 21:40:17
Message-ID: CA+TgmoZhzRxZ9RWsTpUQiw+zNGUtNAFsFQDqr+W1dn2BJom7TQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <julien(at)tachoires(dot)me> wrote:
> Thank you for this quick feedback.
>
> One potential approach to solve this in heapgettup() would be:
> 1. hold the buffer lock
> 2. get the tuple from the buffer
> 3. if the tuple is not visible, move to the next tuple, back to 2.
> 4. release the buffer lock
> 5. if the tuple does not satisfy the scan keys, take the buffer lock,
> move to the next tuple, back to 2.
> 6. return the tuple
>
> Do you see something fundamentally wrong here?

I spent a bit of time this afternoon looking at v4-0001. I noticed a
few spelling mistakes (abritrary x2, statisfied x1). As far as the
basic approach is concerned, I don't see how there can be a safety
problem here. If it's safe to release the buffer lock when we find a
tuple that matches the quals, for the purposes of returning that tuple
to the caller, then it seems like it must also be safe to release it
to evaluate a proposed qual.

Potentially, there could be a performance problem. Imagine that we
have some code right now that uses this code path and it's safe
because the qual that we're evaluating is something super-simple like
the integer less-than operator, so calling it under the buffer lock
doesn't create a stability hazard. Well, with the patch, we'd
potentially take and release the buffer lock a lot more times than we
do right now. Imagine that there are lots of tuples on each page but
only 1 or very few of them satisfy the qual: then we lock and unlock
the buffer a whole bunch of times instead of just once.

However, I don't think this really happens in practice. I believe it's
possible to take this code path if you set ignore_system_indexes=on,
because that turns index scans --- which, not surprisingly, have
scankeys --- into sequential scans which then end up also having
scankeys. Many of those scans use catalog snapshots so there's no
issue, but a little bit of debugging code seems to show that
systable_beginscan() can also be called with snapshot->snapshot_type
set to SNAPSHOT_ANY or SNAPSHOT_DIRTY. For example, see
GetNewOidWithIndex(). However, even if ignore_system_indexes=on gets a
little slower as a result of this or some other patch, I don't think
we really care, and without that setting, this code doesn't seem to
get exercised at all.

So, somewhat to my surprise, I think that v4-0001 might be basically
fine. I wonder if anyone else sees a problem that I'm missing?

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2025-12-09 21:42:21 Re: Trying out read streams in pgvector (an extension)
Previous Message Andres Freund 2025-12-09 21:26:42 Re: [PATCH] VACUUM: avoid pre-creation transactions holding back cleanup of newly created relations