RE: "unexpected duplicate for tablespace" problem in logical replication

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: "wangsh(dot)fnst(at)fujitsu(dot)com" <wangsh(dot)fnst(at)fujitsu(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: "unexpected duplicate for tablespace" problem in logical replication
Date: 2022-04-11 07:40:11
Message-ID: TYCPR01MB8373616AF8BA3C819535B998EDEA9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Friday, April 8, 2022 6:44 PM I wrote:
> On Wednesday, April 6, 2022 11:14 AM wangsh(dot)fnst(at)fujitsu(dot)com
> <wangsh(dot)fnst(at)fujitsu(dot)com> wrote:
> > I met a problem while using logical replication in PG11 and I think
> > all the PG version have this problem.
> >
> >
> > The log looks like:
> > > ERROR: unexpected duplicate for tablespace 0, relfilenode xxxxxxx
> > Someone also reported this problem in [1], but no one has responded to it.
> >
> >
> >
> > I did some investigation, and found a way to reproduce this problem.
> > The steps are:
> >
> >
> > 1. create a table (call it tableX) and truncate it.
> >
> >
> > 2. cycle through 2^32 OIDs.
> >
> >
> > 3. restart the database to clear all the cache.
> >
> >
> > 4. create a temp table which make the temp table's OID equals to the
> > tableX's relfilenode and insert any data into tableX.
> >
> >
> > The attachment(run.sh) can reproduce this problem in PG10 and PG11with
> > the help of option 'WITH OIDS'. I don't find any way to cycle the OIDs
> > quickly in branch master, but I use the gdb to reproduce this problem too.
> >
> >
> >
> > Now, function GetNewRelFileNode() only checks:
> >
> >
> > 1. duplicated OIDs in pg_class.
> >
> >
> > 2. relpath(rnode) is exists in disk.
> >
> >
> > However, the result of relpath(temp table) and relpath(non-temp table)
> > are different, temp table's relpath() has a prefix "t%d". That means,
> > if there is a table that value of relfilenode is 20000(but the value
> > of oid isn't 20000), it's possible to create a temp table that value
> > of relfilenode is also 20000.
> >
> >
> > I think function GetNewRelFileNode() should always check the
> > duplicated relfilenode, see the patch(a simple to way to fix this
> > problem is master branch).
> >
> >
> > Any comment?
> Hi, thank you for your report.
>
>
> It seems correct that there's room that wraparounded oid can be used for temp
> table, and we get duplicate result when we retrieve it and face the error.
>
> I reproduced your issue with HEAD and gdb, by replacing rnode.node.relNode
> with an existing relfilenode in GetNewRelFileNode(), immediately before the
> call of relpath().
One thing I forgot to note is that this bug is not unique to the logical replication.
There is other path to hit it for example, pg_filenode_relation
in the same procedures with gdb.

In the below output, I created tempa table with the same filenode with gdb
without having a pair of logical replication and got the same error you reported.

postgres=# select oid, relname, relfilenode, reltablespace from pg_class where relname in ('c', 'tempa');
oid | relname | relfilenode | reltablespace
-------+---------+-------------+---------------
16387 | c | 16390 | 0
16390 | tempa | 16390 | 0
(2 rows)

postgres=# select pg_filenode_relation(0, 16390);
ERROR: unexpected duplicate for tablespace 0, relfilenode 16390

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-04-11 07:46:41 BUG #17462: Invalid memory access in heapam_tuple_lock
Previous Message osumi.takamichi@fujitsu.com 2022-04-08 09:44:07 RE: "unexpected duplicate for tablespace" problem in logical replication