| From: | Radim Marek <radim(at)boringsql(dot)com> |
|---|---|
| To: | pgsql-hackers(at)postgresql(dot)org |
| Subject: | Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS |
| Date: | 2026-02-02 22:54:24 |
| Message-ID: | CAJgoLkLCCKy1hAFpoUzwaRq91cA=m=-J03MBpUo-d7-oXy6MzA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello,
First let me say this is a first time bug report and trying to patch the
PostgreSQL, fresh after the hacking webinar. Please let me know if I need
to clarify something.
TLDR; EXPLAIN ANALYZE BUFFERS reports different buffer counts for the Sort
node between first and subsequent executions per session.
Buffers: shared hit=4
-> Seq Scan on sort_buffer_test (cost=0.00..2.50 rows=1
width=32) (actual time=0.042..0.126 rows=1.00 loops=1)
Filter: (lower(email) = 'email_1(at)example(dot)com'::text)
Rows Removed by Filter: 99
Buffers: shared hit=1
vs
Buffers: shared hit=1
-> Seq Scan on sort_buffer_test (cost=0.00..2.50 rows=1
width=32) (actual time=0.021..0.057 rows=1.00 loops=1)
Filter: (lower(email) = 'email_1(at)example(dot)com'::text)
Rows Removed by Filter: 99
Buffers: shared hit=1
I found it while working on SQL regression tool (regresql) that uses total
reported buffers from EXPLAIN ANALYZE BUFFERS to detect deviations for
query performance.
The issue came on smaller queries where test cases showed consistent
"improvement" on performance (fewer buffers) when comparing baseline and
actual test runs. Previously did not notice it on larger data sets due to
the difference in buffers being negligible given the larger operations.
The example is the test case I managed to isolate.
--- way how to reproduce
CREATE TABLE sort_buffer_test (id serial PRIMARY KEY, val int);
INSERT INTO sort_buffer_test (val) SELECT generate_series(1, 100);
ANALYZE sort_buffer_test;
-- run twice in new psql session and observe reported buffer change
EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
SELECT * FROM sort_buffer_test ORDER BY val DESC LIMIT 1;
---
This behaviour is session specific, tested on 17.x, 18.x and current
development version (just fiy, all verified on both amd64 and aarch64).
I traced the potential issue down in case of ORDER BY to pg_amop/pg_amproc
lookups. The specific lookups vary by operation but the pattern is the
same: first execution incurs catalog reads that get cached for subsequent
runs. This applies to DISTINCT (sort, not hashed), GROUP BY, window
functions, etc - which seems to isolate it to sort node.
All those I believe are actually metadata resolution (i.e. planner already
knows what to sort by).
Having said that - I'm happy to try to fix it. Just wondering what is the
right direction to go:
1. Pre-warm syscache during planning (tried locally and it works - but does
not feel conceptually right and it's rather 'hacky')
2. Pre-compute and store metadata and modify each plan node struct for each
type (lots of work). I believe that's what MergeJoin already does.
3. Or this is 'works as expected' - which I don't think it's the case and
would force me to do pre-warming on regresql side
Kind regards,
Radim
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jacob Champion | 2026-02-02 23:35:45 | Re: libpq: Bump protocol version to version 3.2 at least until the first/second beta |
| Previous Message | Nathan Bossart | 2026-02-02 22:51:54 | Re: refactor architecture-specific popcount code |