Re: Fix for segfault in logical replication on master

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "akapila(at)postgresql(dot)org" <akapila(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Fix for segfault in logical replication on master
Date: 2021-06-17 11:33:25
Message-ID: CAA4eK1K3_JcTeFKf6zmh2_C-x2mmPp+o52yHwDwXq0fgc1-1dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 17, 2021 at 4:09 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Thursday, June 17, 2021 2:43 PM Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > > On Jun 16, 2021, at 10:19 PM, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> > >
> > > I started to analyze your report and
> > > will reply after my idea to your modification is settled.
> >
> > Thank you.
> I'll share my first analysis.
>
> > In commit e7eea52b2d, you introduced a new function,
> > RelationGetIdentityKeyBitmap(), which uses some odd logic for determining
> > if a relation has a replica identity index. That code segfaults under certain
> > conditions. A test case to demonstrate that is attached. Prior to patching
> > the code, this new test gets stuck waiting for replication to finish, which never
> > happens. You have to break out of the test and check
> > tmp_check/log/021_no_replica_identity_publisher.log.
> >
> > I believe this bit of logic in src/backend/utils/cache/relcache.c:
> >
> > indexDesc = RelationIdGetRelation(relation->rd_replidindex);
> > for (i = 0; i < indexDesc->rd_index->indnatts; i++)
> >
> > is unsafe without further checks, also attached.
> You are absolutely right.
> I checked the crash scenario and reproduced the core,
> which has a null indexDesc. Also, rd_replidindex must be checked beforehand
> as you included in your patch, because having an index does not necessarily
> mean to have a replica identity index. As the proof of this, the oid of
> rd_replidindex in the scenario is 0. OTOH, I've confirmed your new test
> has passed with your fix.
>
> Also, your test looks essentially minimum(suitable for the problem) to me.
>
> * RelationGetIdentityKeyBitmap
> + /*
> + * Fall out if the description is not for an index, suggesting
> + * affairs have changed since we looked. XXX Should we log a
> + * complaint here?
> + */
> + if (!indexDesc)
> + return NULL;
> + if (!indexDesc->rd_index)
> + {
> + RelationClose(indexDesc);
> + return NULL;
> + }
> For the 1st check, isn't it better to use RelationIsValid() ?
> I agree with having the check itself of course, though.
>
> Additionally, In what kind of actual scenario, did you think that
> we come to the part to "log a complaint" ?
>

Yeah, I think that part is not required unless there is some case
where it can happen. I guess we might want to have an elog at that
place with a check like:
if (!RelationIsValid(relation))
elog(ERROR, "could not open relation with OID %u", relid);

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-06-17 11:37:01 Re: Unresolved repliaction hang and stop problem.
Previous Message Tomas Vondra 2021-06-17 11:29:23 Re: PoC: Using Count-Min Sketch for join cardinality estimation