| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | David Rowley <dgrowleyml(at)gmail(dot)com>, Radim Marek <radim(at)boringsql(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS |
| Date: | 2026-02-04 15:59:31 |
| Message-ID: | 94758ba1-9387-4b1d-a13a-48ea4ecbe8e9@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2/3/26 22:53, David Rowley wrote:
> On Tue, 3 Feb 2026 at 20:21, Radim Marek <radim(at)boringsql(dot)com> wrote:
>> Apologies, "bug" might be a bit harsh and didn't want to imply those buffers won't be reported. My assumption is that catalogue lookups for resolving sort operator metadata would be reported as planning buffers rather than execution buffers. This is already what's reported for other lookups for very same query above.
>>
>> The executor re-resolves sort operator metadata that the planner already looked up. The first lookup counts as planning buffers, the second as execution buffers.
>
> If it was looking up the same thing, then there shouldn't be a cache
> miss, and the buffers won't be accessed.
>
Right.
I think there's a bit of confusion because the planning phase resolves
only some of the information, and then some additional pieces are left
to the execution phase. Which won't be counted in "planning" buffers,
and that may seem surprising to users.
>> Would passing that already resolved opfamily from the planner be a reasonable direction? My reasoning in this case is that I'm building on the hypothesis that buffers (whether shared hit or read) should be deterministic for the same query on the same data, and the same plan. This re-resolution on first execution breaks it.
>
> I think generally, offloading more work into the planner for looking
> up things that are always the same for every invocation of a plan is
> generally good. For anything that moves things more in that direction,
> you'd need to ensure that the plan is correctly invalidated when
> something changes about the extra thing you looked up in planning.
> I've not looked into the specifics of this instance, other than the
> function being called which causes the buffer usage is
> get_opfamily_proc(). I don't see that called during planning for this
> query, so your statement about "already resolved opfamily from the
> planner" doesn't seem true.
>
Yes, that's essentially what I explained yesterday.
I'm not sure about moving all these lookups to the planner (from the
executor). Could this easily hurt some cases? We'd need to do this for
all fields the final plan might need, i.e. we'd need to initialize all
fields for all paths we construct - not just the metadata needed by the
final cheapest one. That could be a lot of additional lookups, right?
That would make the first execution more expensive. Later executions of
a query that would be cached, but we'd be still paying the cost for the
lookup (cheap).
Of course, there's a lot of such fields - this amproc is just an
example. I don't think it's practical to rework all of that to
initialize everything early.
> However, I imagine this is just one of many things and if you're
> aiming to stabilise your tests doing this, then you'll likely be
> fixing things for a long time to come. I imagine a better way if
> you're just doing this for running test suites would be to invent some
> way to prepopulate the sys and cat caches. Maybe that could be a
> contrib module. I don't think there's a way to get a relcache miss
> during execution, but I guess it could change with prepared statements
> if we ever decided to start evicting long unused relcache entries one
> day.
>
I'm not sure there's a good way to warmup the sys/cat caches, short of
loading everything. Because how would you know what will be needed by
the execution plan? It seems very complex.
One option would be to run the queries twice - the first one would
warmup caches, the second execution would be the measured one. But that
makes the tool 2x as expensive.
But what if we had a way to *invalidate* all the caches? That'd also
make the behavior deterministic - there would be no hits initially. And
it seems much simpler to implement.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-02-04 16:06:29 | Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible |
| Previous Message | Zsolt Parragi | 2026-02-04 15:57:48 | Re: Flush some statistics within running transactions |