Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add parallel columns for seq scan and index scan on pg_stat_all_tables and _indexes
Date: 2026-01-19 00:25:59
Message-ID: aW16FyZsIE2jeE6H@paquier.xyz
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 09, 2026 at 07:53:08AM +0000, Bertrand Drouvot wrote:
> While working on flushing stats outside of transaction boundaries (patch not
> shared yet but linked to [1]), I realized that parallel workers could lead to
> incomplete and misleading statistics. Indeed, they update "their" relation
> stats during their shutdown regardless of the "main" transaction status.
>
> It means that, for example, stats like seq_scan, last_seq_scan and seq_tup_read
> are updated by the parallel workers during their shutdown while the main
> transaction has not finished. The stats are then somehow incomplete because the main
> worker has not updated its stats yet. I think that could lead to misleading stats
> that a patch like this one could help to address. For example, parallel workers
> could update parallel_* dedicated stats and leave the non parallel_* stats update
> responsibility to the main worker when the transaction finishes. That would make
> the non parallel_* stats consistent whether parallel workers are used or not.

(Re-reading the thread to remember the context..)

It depends, I guess. I still doubt that adding parallel worker data
at table and index level is the right move compared to all the
information we have now on HEAD, because this extra information is not
actionable in terms of tuning GUCs or reloptions.

Now, do you think that the extra noise of data flushed by the parallel
workers shutting down and flushing their data before the main
transaction has committed in the "main" backend process could really
impact the tuning decisions users may want to take? Stats are not
about precision, they are about offering trends that help in taking
better decisions to drive the backend server in a direction where its
administrator wants to lead it to. If the noise is too high, and that
this noise drives to incorrect tuning decision, the system could go
crazy and that would be an issue. My question is then: does this
extra data flushed by the parallel workers before transaction end,
which you are qualifying as noise, really matter when it comes to the
tuning decisions one needs to take? That stance would apply mostly to
analytical queries, of course, where parallel workers would have more
data to flush. Parallel workers flushing could have a lot of data to
report, but the transaction commit just delays the availability of
this information.

When it comes to what you are describing as problem, my intuition is
telling me that we don't have a problem to solve at all here, but I'm
OK to be proved wrong, as well.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2026-01-19 01:07:36 Re: Cleaning up PREPARE query strings?
Previous Message Michael Paquier 2026-01-19 00:13:12 Re: Add WALRCV_CONNECTING state to walreceiver