RE: Fix for segfault in logical replication on master

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Mark Dilger' <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: "akapila(at)postgresql(dot)org" <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Fix for segfault in logical replication on master
Date: 2021-06-17 10:39:32
Message-ID: OSBPR01MB4888BA75D9B4CCCA0D78D55FED0E9@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 17, 2021 2:43 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > On Jun 16, 2021, at 10:19 PM, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> >
> > I started to analyze your report and
> > will reply after my idea to your modification is settled.
>
> Thank you.
I'll share my first analysis.

> In commit e7eea52b2d, you introduced a new function,
> RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> if a relation has a replica identity index. That code segfaults under certain
> conditions. A test case to demonstrate that is attached. Prior to patching
> the code, this new test gets stuck waiting for replication to finish, which never
> happens. You have to break out of the test and check
> tmp_check/log/021_no_replica_identity_publisher.log.
>
> I believe this bit of logic in src/backend/utils/cache/relcache.c:
>
> indexDesc = RelationIdGetRelation(relation->rd_replidindex);
> for (i = 0; i < indexDesc->rd_index->indnatts; i++)
>
> is unsafe without further checks, also attached.
You are absolutely right.
I checked the crash scenario and reproduced the core,
which has a null indexDesc. Also, rd_replidindex must be checked beforehand
as you included in your patch, because having an index does not necessarily
mean to have a replica identity index. As the proof of this, the oid of
rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
has passed with your fix.

Also, your test looks essentially minimum(suitable for the problem) to me.

* RelationGetIdentityKeyBitmap
+ /*
+ * Fall out if the description is not for an index, suggesting
+ * affairs have changed since we looked. XXX Should we log a
+ * complaint here?
+ */
+ if (!indexDesc)
+ return NULL;
+ if (!indexDesc->rd_index)
+ {
+ RelationClose(indexDesc);
+ return NULL;
+ }
For the 1st check, isn't it better to use RelationIsValid() ?
I agree with having the check itself of course, though.

Additionally, In what kind of actual scenario, did you think that
we come to the part to "log a complaint" ?

I'm going to spend some time to analyze RelationGetIndexAttrBitmap next
to know if similar hazards can happen, because RelationGetIdentityKeyBitmap's logic
comes from the function mainly.

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-17 10:57:24 Re: locking [user] catalog tables vs 2pc vs logical rep
Previous Message Fabien COELHO 2021-06-17 09:52:04 Re: pgbench bug candidate: negative "initial connection time"