Re: relfilenode statistics

From: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kirill Reshke <reshkekirill(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: relfilenode statistics
Date: 2026-05-18 16:28:26
Message-ID: ags+KgFrtnZhKX7x@bdtpg
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Mar 31, 2026 at 10:45:50AM +0000, Bertrand Drouvot wrote:
> Hi,
>
> On Wed, Mar 25, 2026 at 03:25:07AM +0000, Bertrand Drouvot wrote:
> > Hi,
> >
> > On Wed, Mar 18, 2026 at 03:57:48AM +0000, Bertrand Drouvot wrote:
> > > Hi,
> > >
> > > PFA, new rebase due to fba4233c832.
> >
> > Another rebase, due to 2102ebb1953 this time.
>
> It's more than probably too late for v19 but it needs another rebase due to
> d7965d65fc5b this time.

PFA v16, a rebase due to 775fe51daae, 71ff232a5bc and c0b53ec0630.

While at it, let's sum up the current state:

Regarding Michael's question [1] about whether we should copy stats across
rewrites: I still believe we should. Not doing so would produce user-visible
regressions. The complexity is contained in patch 0002 and the approach is
tested (including 2PC, subtransaction abort, and rewrite chains).

Regarding Michael's suggestion [2] to split PgStat_StatTabEntry into three kinds
(table/index/relfilenode) from the start: I think this patch is the right
incremental step that doesn't preclude a future split. Here's my reasoning:

As Andres pointed out [3], we'd want to populate more than just
dead_tuples/ins_since_vacuum/mod_since_analyze during recovery. The right
boundary for a split isn't clear yet until we actually implement WAL-replay-based
stat population.

I think that splitting now would be a much larger change with the risk of drawing
the boundaries wrong.

The current approach (key PGSTAT_KIND_RELATION by locator, keep the unified structure)
is a contained change that unblocks future work. Once we have WAL replay populating
stats, we'll have a much better understanding of what a split should look like,
if one is still needed.

I think we should do this incremental step first, then split later if/when the
need becomes clearer.

I believe we have consensus on the core approach ("use the relfilenumber instead
of the relation OID, without changing the user experience"). The implementation
addresses all the technical concerns raised so far (no new hash key field,
PSEUDO_PARTITION_TABLE_SPCOID for partitioned tables, pgstat_fetch_stat_tabentry_by_locator()
to avoid extra syscache lookups in do_autovacuum()).

Andres, would you be willing to drive this toward commit once we've iterated
on any remaining review feedback?

Michael, I understand this isn't the design you'd prefer. Would you be open to
reviewing the implementation nonetheless, or do you have a hard objection that
would block this path?

I'm happy to address any further concerns.

[1]: https://postgr.es/m/aRGoGcOdutTHQfpn%40paquier.xyz
[2]: https://postgr.es/m/aUELPdhdcyzTM_8K%40paquier.xyz
[3]: https://postgr.es/m/zferux2jlbhqymubzhpubfrkjzhzxzguq4eprtycojtif5vbqh%402t7cu2teyqmi

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v16-0001-Key-PGSTAT_KIND_RELATION-by-relfile-locator.patch text/x-diff 26.9 KB
v16-0002-handle-relation-statistics-correctly-during-rewr.patch text/x-diff 25.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message surya poondla 2026-05-18 16:42:51 Re: Bug: mdunlinkfiletag unlinks mainfork seg.0 instead of indicated fork+segment
Previous Message Daniel Gustafsson 2026-05-18 16:17:22 Re: Fix typo 586/686 in atomics/arch-x86.h