Re: "unexpected duplicate for tablespace" problem in logical replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, wangsh(dot)fnst(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: "unexpected duplicate for tablespace" problem in logical replication
Date: 2024-01-08 05:02:03
Message-ID: CALDaNm1NjZDJGEoYaRPMGsXrOMQwJ2bK4EXC-wvoSSM77XX95A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, 19 Apr 2022 at 11:54, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Apr 12, 2022 at 11:45 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> >
> > At Mon, 11 Apr 2022 10:24:26 +0000, "wangsh(dot)fnst(at)fujitsu(dot)com" <wangsh(dot)fnst(at)fujitsu(dot)com> wrote in
> > > Please see the attachment
> >
> > Sorry for being late, but this seems to be in the wrong direction.
> >
> > In the problem case, as you know, GetNewRelFileNode does *not* check
> > oid uniqueness in pg_class. The on-catalog duplicate check is done
> > only when the result is to be used as the *table oid* at creation. In
> > other cases it chooses a relfilenode that does not duplicate in
> > storage file names irrelevantly to the relation oid. As the result
> > Non-temporary and temporary tables have separate relfilenode oid
> > spaces, either intentional or not..
> >
> > Thus, I think we don't want GetNewRelFileNode to check for duplicate
> > oid on pg_class if pg_class is not passed since that significantly
> > narrow the oid space for relfilenode. If we did something like that,
> > we might do check the existence of file names of the both non-temp and
> > temp in GetNewRelFileNode(), but that also narrows the relfilenode oid
> > space.
>
> Agreed.
>
> >
> > RelidByRelfilenode is called by autoprewarm, logical replication, and
> > pg_filenode_relation. In the autoprewarm and logical replication
> > cases, it is called only for non-temprary relations so letting the
> > function ignore (oid duplicating) temp tables works.
> > pg_relfilienode_relation is somewhat dubious. It is affected by this
> > bug. In the attached PoC, to avoid API change (in case for
> > back-patching), RelidByRelfilenode ignores duplcates of differernt
> > persistence. Also the PoC prioritize on persistent tables to
> > temporary ones but I'm not sure it's good decision, but otherwise we
> > need to change the signature of pg_filenode_relation.
> >
> > The attached is an PoC of that. The first one is a reproduction-aid
> > code. With it applied, the following steps create duplicate
> > relfilenode relations.
> >
> > What do you think about this direction?
>
> Sounds good to me. If we go in this direction, probably we also need
> to change the cache checks in RelidByRelfilenode() so that we
> prioritize non-temporary relations. Otherwise, we will end up
> returning the OID of temporary relation if it's already cached.
>
> Another idea I came up with is that we have RelidByRelfilenode() check
> only non-temporary relations by adding relpersistence !=
> RELPERSISTENCE_TEMP to the scan key. If we want to support temporary
> relations too, I think that, in v16, we can add relpersistence to
> RelidByRelfilenode() as a function argument (and a flag to
> pg_filenode_relation() accordingly) so that the user can get the name
> of the temporary relation by like pg_filenode_relation(0, 16386,
> true). On the other hand, in back branches, if an application is using
> pg_filenode_relation() to get the name of temporary relations, it
> would break it.

I have changed the status to "Waiting on Author" as the points raised
by Masahiko Sawada-san is pending.

Regards,
Vignesh

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dilip Kumar 2024-01-08 05:13:52 Re: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.
Previous Message tender wang 2024-01-08 02:33:26 Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition