| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(at)vondra(dot)me>, Peter Smith <smithpb2250(at)gmail(dot)com>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Subject: | Re: Stack-based tracking of per-node WAL/buffer usage |
| Date: | 2026-04-07 22:19:06 |
| Message-ID: | rhah5kx5ftfztu532zck4mhc4hxtfis4z552laisd53sf6ycrn@wnbste4hleyg |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-04-07 13:30:11 -0700, Lukas Fittl wrote:
> 0001 is the change to make queryDesc->totaltime be allocated by
> ExecutorStart instead of plugins themselves, and adds a
> queryDesc->totaltime_options to have plugins request which level of
> summary instrumentation they need. This change is pretty simple, and
> could still make sense to get into 19. Because of the earlier
> Instrumentation refactoring that was pushed (thanks!) we're already
> asking extensions allocating queryDesc->totaltime to modify their use
> of InstrAlloc, so I think we might as well clean this up now.
Hm. That's a fair argument. They indeed would have to again change next
release
It's not a complicated change and removes more lines than it adds.
I guess one thing I'm not sure is whether the fields shouldn't be renamed at
the same time:
a) To prevent extensions from continuing to set it, most of them do not test
against assert enabled builds. With a different name they would get a
compiler error.
b) "totaltime" and "totaltime_options" are pretty poor descriptors of tracking
query level statistics. If everyone has to change anyway, this is a good
occasion.
'query_instr[_options]'?
Any opinions?
> 0002 is just ExecProcNodeInstr moved to instrument.c, as Andres had
> suggested previously. We still get some quick performance wins from
> doing that (see end of email), and again, its a simple change, so
> could be considered if someone has bandwidth remaining. I've added a
> later patch that then does the more complex inlining and gets us the
> full speed up.
Here it needs a few more inlines to get the full performance, otherwise it
doesn't inline all the helpers. I think on balance I didn't like the
prototype in instrument.h, that's too widely included, and it might even cause
some circularity issues. It seems better to do the somewhat ugly thing and
have the prototype be in executor.h.
> 0002 measurements (with current master and TSC clock source used for
> timing, best of three):
>
> CREATE TABLE lotsarows(key int not null);
> INSERT INTO lotsarows SELECT generate_series(1, 50000000);
> VACUUM FREEZE lotsarows;
With the somewhat more extreme benchmark I used in the rdtsc thread and the
added inline mentioned above I see a bit bigger wins. See the attached
explainbench.sql - it doesn't quite cover all the combinations, but I think it
gives a good enough overview.
c=1 pgbench -f ~/tmp/explainbench.sql -P5 -r -t 10
master:
statement latencies in milliseconds and failures:
200.800 0 SELECT pg_prewarm('pgbench_accounts');
0.098 0 PREPARE query AS SELECT * FROM pgbench_accounts OFFSET 5000000 LIMIT 1;
212.010 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING OFF)
268.648 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING ON)
232.421 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING OFF)
283.531 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING ON)
0.030 0 DEALLOCATE query;
0002:
statement latencies in milliseconds and failures:
201.558 0 SELECT pg_prewarm('pgbench_accounts');
0.103 0 PREPARE query AS SELECT * FROM pgbench_accounts OFFSET 5000000 LIMIT 1;
188.696 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING OFF)
244.479 0 EXPLAIN (ANALYZE, BUFFERS OFF, WAL OFF, TIMING ON)
223.773 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING OFF)
266.947 0 EXPLAIN (ANALYZE, BUFFERS ON, WAL ON, TIMING ON)
0.034 0 DEALLOCATE query;
That's something like 4-12%.
Pretty nice for a patch that just adds a few lines around and adds a few
inlines.
> At this point I'd say its safe to say that we should push out later
> changes to PG20, because it needs another good look over, and I don't
> think Andres or Heikki have the capacity for that today (but I really
> appreciate all the effort put in by both of you!).
Indeed.
> @@ -334,6 +334,9 @@ explain_ExecutorStart(QueryDesc *queryDesc, int eflags)
>
> if (auto_explain_enabled())
> {
> + /* We're always interested in runtime */
> + queryDesc->totaltime_options |= INSTRUMENT_TIMER;
> - queryDesc->totaltime = InstrAlloc(INSTRUMENT_ALL);
Not that it's going to make a significant difference, but it is nice that this
now would need to track less.
Kinda wonder about having
EXPLAIN (ANALYZE BUFFERS totals_only, WAL totals_only) ...;
in plenty cases that'd be all one needs, at substantially lower cost.
Greetings,
Andres Freund
| Attachment | Content-Type | Size |
|---|---|---|
| explainbench.sql | application/sql | 397 bytes |
| v17a-0001-instrumentation-Move-ExecProcNodeInstr-to-allow.patch | text/x-diff | 5.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andreas Karlsson | 2026-04-07 22:26:05 | Re: updates for handling optional argument in system functions |
| Previous Message | Tom Lane | 2026-04-07 22:05:47 | Re: pg_plan_advice |