From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz>, vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, wangsh(dot)fnst(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, Andres Freund <andres(at)anarazel(dot)de> |
Subject: | Re: "unexpected duplicate for tablespace" problem in logical replication |
Date: | 2025-08-21 14:24:55 |
Message-ID: | CAExHW5vojKfVMYij=bU3NgPS8ajx7udB1CiibqJruYj71JcoNw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Thu, Aug 21, 2025 at 11:42 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Aug 13, 2025 at 03:29:50PM +0530, Ashutosh Bapat wrote:
>
> Sorry for the delay. This has been on my TODO list for some time.
> Back to it now.
>
> > On Wed, Aug 13, 2025 at 3:18 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >> On Thu, Jun 19, 2025 at 2:17 PM Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >>> The description of UNLOGGED mentions "permanent" so it looks like your
> >>> use of "permanent" covers both regular and unlogged tables. However
> >>> looking purely at the RELPERSISTENCE_ names, it' s possible that one
> >>> will associate the word "permanent" with RELPERSISTENCE_PERMANENT
> >>> only, and not with RELPERSISTENCE_UNLOGGED. Should we use
> >>> "non-temporary" instead to avoid confusion?
>
> FWIW, I see a pretty good point in backpatching what we have here, as
> active users of logical replication can be impacted by this issue
> randomly. Even if this requires a wraparound, for some users it may
> be a window that can open in a couple of days, mostly for workloads
> with a high temp table exhaustion, I guess. In short, let's not touch
> pg_proc.dat, and I'm planning for a backpatch all the way down due to
> the possible random intrusions with the logirep paths.
The only change in pg_proc.dat is change in the function description.
We can revert it for the benefit of backpatching. Its usefulness is
arguable. +1 for not touching pg_proc.dat.
>
> The changes at the top of RelidByRelfilenumber() describing the
> problem with temporary tables feels a bit bloated. I see no need to
> mention directly autoprewarm and logical replication, two of the three
> callers of this routine, with the explanation coming down to the fact
> that this is only a temp relation problem because we cannot pinpoint
> to an OID without knowing a proc number.
How about the following:
---
Map a relation's (tablespace, relfilenumber) to a relation's oid and
cache the result.
A temporary relation may have the same relfilenumber as a permanent
relation, or as other temporary relations. In order to uniquely
identify a temporary relation we also need the
proc number of the backend which created it. The proc number is not
available to this function. Hence ignore temporary relations.
Returns InvalidOid if no permanent relation matching the criteria
could be found.
---
> So let's cut a few lines
> here at the top of RelidByRelfilenumber(), let's add a simpler line in
> func-admin.sgml to say directly and only that "temporary relations are
> not supported", passing on the "permanent" vs "no permanent" business.
> I don't see a direct need to mention that at the top of
> pg_relation_filenode(), as long as RelidByRelfilenumber(), but OK by
> me to add a simpler "temporary relations are not supported", it looks
> like most prefer than based on the contents of the thread.
If we just say that temporary relations are not supported, the user
may wonder as to what will happen if they pass tablespace and
relfilenode of a temporary relation. Instead it's better to clarify
that behaviour. How about adding following line to the current
description of the function in func-admin.sgml
"Also, returns <literal>NULL</literal> if a temporary relation in the
current database is associated with the given values.".
Said that if you insist on adding just "Temporary relations are not
supported", we can go ahead with it.
>
> >>> Logical replication and autoprewarm may not cause such a large bloat
> >>> but there is no limit to passing invalid combinations of reltablespace
> >>> and relfilenumber to pg_filenode_relation(). Do we want to prohibit
> >>> that case by passing a flag from logical pg_filenode_relation() to not
> >>> cache invalid entries?
>
> Fun. Yes, I agree that we could do better here. It does not strike
> me as an issue as invasive as the original report, direct calls of
> pg_filenode_relation() are much rarer than the code paths of
> autoprewarm and logical replication touched by RelidByRelfilenumber().
> That's a potential HEAD improvement to me.
Ok. I will make a patch to ignore negative entries when
RelidByRelfilenumber() is invoked from pg_filenode_relation(). I am ok
with head-only improvement. If we find reports from the field that
this behaviour creates problems in the previous releases, we will be
able to back patch it. I will try to come up with a back-patchable
patch.
Vignesh, since you are the original author of the patch, do you want
to take care of addressing the comments? Otherwise I can.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-21 21:36:56 | Re: "unexpected duplicate for tablespace" problem in logical replication |
Previous Message | Fujii Masao | 2025-08-21 13:22:02 | Re: Unexpected Standby Shutdown on sync_replication_slots change |