Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Hadi Moshayedi <hadi(at)moshayedi(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REL_12_STABLE crashing with assertion failure in ExtractReplicaIdentity
Date: 2019-08-20 23:06:46
Message-ID: 20190820230646.bf7vtsizlusdgwaz@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-08-17 01:43:45 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On 2019-08-16 09:44:15 -0700, Hadi Moshayedi wrote:
> >> It seems that sometimes when DELETE cascades to referencing tables we fail
> >> to acquire locks on replica identity index.
>
> > I suspect this "always" has been broken, it's just that we previously
> > didn't have checks in place that detect these cases. I don't think it's
> > likely to cause actual harm, due to the locking on the table itself when
> > dropping indexes etc. But we still should fix it.
>
> Yeah ... see the discussion leading up to 9c703c169,
>
> https://www.postgresql.org/message-id/flat/19465.1541636036%40sss.pgh.pa.us
>
> We didn't pull the trigger on removing the CMD_DELETE exception here,
> but I think the handwriting has been on the wall for some time.

ISTM there's a few different options here:

1a) We build all index infos, unconditionally. As argued in the thread
you reference, future tableams may eventually require that anyway,
by doing more proactive index maintenance somehow. Currently there's
however no support for such AMs via tableam (mostly because I wasn't
sure how exactly that'd look, and none of the already in-development
AMs needed it).

2a) We separate acquisition of index locks from ExecOpenIndices(), and
acquire index locks even for CMD_DELETE. Do so either during
executor startup, or as part of AcquireExecutorLocks() (the latter
on the basis that parsing/planning would have required the locks
already).

There's also corresponding *b) options, where we only do additional work
for CMD_DELETE if wal_level = logical, and the table has a replica
identity requiring use of the index during deleteions. But I think
that's clearly enough a bad idea that we can just dismiss it out of
hand.

3) Remove the CheckRelationLockedByMe() assert from
ExtractReplicaIdentity(), at least for 12. I don't think this is an
all that convicing option, but it'd reduce churn relatively late in
beta.

4) Add a index_open(RowExclusiveLock) to ExtractReplicaIdentity(). That
seems very unconvincing to me, because we'd do so for every row.

I think there's some appeal in going towards 2), because batching lock
acquisition into a more central place has the chance to yield some
speedups on its own, but more importantly would allow for batched
operations one day. Centralizing lock acquisition also seems like it
might make things easier to understand than today, where a lot of
different parts of the system acquire the locks, even just for
execution. But it also seems to be likely too invasive for 12 - making
me think that 1a) is the way to go for now.

Comments?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex 2019-08-21 01:21:48 Re: Serialization questions
Previous Message Andres Freund 2019-08-20 21:16:50 Re: Inadequate executor locking of indexes