回复:回复:回复:Bug about drop index concurrently

From: 李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>
To: "Tomas Vondra" <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: 回复:回复:回复:Bug about drop index concurrently
Date: 2019-10-24 03:10:16
Message-ID: bd16c875-9fe1-4c34-805c-e0a2ea756015.adger.lj@alibaba-inc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>That suggests you're doing a lot of 'drop index concurrently', right?

Not completely, In the actual scene, in fact, I didn't perform too much 'drop index concurrently' on the master. I just just execute a lot of queries on the standby. You know, it will have a certain probability every time you execute ‘drop index concurrently’ on the master. Although it is small, it may appear.
>Yes, but we can't really do that, I'm afraid.
>
>We certainly can't do that on the master because we simply don't have
>the necessary information about locks from the standby, and we really
>don't want to have it, because with a busy standby that might be quite a
>bit of data (plust the standby would have to wait for the master to
>confirm each lock acquisition, I think which seems pretty terrible).
>
yeah, we can't do this and will lose too much in order to achieve this.

>On the standby, we don't really have an idea that the there's a drop
>index running - we only get information about AE locks, and a bunch of
>catalog updates. I don't think we have a way to determine this is a drop
>index in concurrent mode :-(
>
>More preciresly, the master sends information about AccessExclusiveLock
>in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when
>the standby replays that, it should acquire those locks.
>
>For regular DROP INDEX we send this:
>
>rmgr: Standby ... desc: LOCK xid 8888 db 16384 rel 16385
>rmgr: Standby ... desc: LOCK xid 8888 db 16384 rel 20573
>... catalog changes ...
>rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST;
> rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache
> 6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608
>
>while for DROP IDNEX CONCURRENTLY we send this
>
>rmgr: Heap ... desc: INPLACE ... catalog update
>rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
>relcache 21288 relcache 16385
>rmgr: Heap ... desc: INPLACE ... catalog update
>rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
> relcache 21288 relcache 16385
>rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288
>... catalog updates ...
>rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST;
> rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6
> catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608
>
yeah, you are right, I got this .

>So just a single lock on the index, but not the lock on the relation
>itself (which makes sense, because for DROP INDEX CONCURRENTLY we don'>t get an exclusive lock on the table).
>
>I'm not quite familiar with this part of the code, but the SELECT
>backends are clearly getting a stale list of indexes from relcache, and
>try to open an index which was already removed by the redo. We do
>acquire the lock on the index itself, but that's not sufficient :-(
>
>Not sure how to fix this. I wonder if we could invalidate the relcache
>for the relation at some point.
--method one --
>Or maybe we could add additional
>information to the WAL to make the redo wait for all lock waiters, just
>like on the master. But that might be tricky because of deadlocks, and
>because the redo could easily get "stuck" waiting for a long-running
>queries.
--method two --

for the method one , I don't think this can solve the problem at all.

Because we can't predict on the standby when the master will 'drop index concurrently'. I also have a way to manually reproduce this bug in order to make it easier for you to understand the problem. You can try to follow the steps below:
1. We build a connection on the standby, then use gdb to attach to the backend process
2. We set a breakpoint at plancat.c:196 in get_relation_info
3. We execute 'explain select ' on this backend, in gdb you will see that the breakpoint has hit, and the query will hang.
4.We execute 'drop index concurrently' on the master. (If 'drop index' will be blocked, because there is a query on our standby, the master can not get EXlock).
5. on the standby ,let gdb continue.
We will see an error(ERROR: could not open relation with OIDXXX) in the standby client.

Therefore, we can see that the query is executed first by the standby, and then the 'drop index concurrently' executed by the master. So no matter how fast we can make relcache invalid, there is no way to prevent this error from happening, because there is always a moment when the index has been dropped by the master. In other words, there is no way to predict the master's 'drop index concurrently' on standby.

for the method two, you are right, deadlock is possible, and the most unbearable is that it will bring delay to the log apply on standb, resulting in inconsistent data between the master and the standby.

All in all, I think this bug is a flaw in postgres design. We need to think carefully about how to handle it better. Even we can learn from other database products. I hope I can help you.

Thank you very much for your attention.

Regards.

adger.

------------------------------------------------------------------
发件人:Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
发送时间:2019年10月24日(星期四) 06:04
收件人:李杰(慎追) <adger(dot)lj(at)alibaba-inc(dot)com>
抄 送:pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
主 题:Re: 回复:回复:Bug about drop index concurrently

On Wed, Oct 23, 2019 at 02:38:45PM +0800, 李杰(慎追) wrote:
>>
>>I'm a bit confused. You shouldn't see any crashes and/or core files in
>>this scenario, for two reasons. Firstly, I assume you're running a
>>regular build without asserts. Secondly, I had to add an extra assert
>>to trigger the failure. So what core are you talking about?
>>
>Sorry, I should explain it more clearly. I saw the core file because I
>modified the postgres source code and added Assert to it.
>>

OK

>>Also, it's not clear to me what do you mean by "bug in the standby" or
>>no lock in the drop index concurrently. Can you explain?
>>
>"bug in the standby" means that we built a master-slave instance, when
>we executed a large number of queries on the standby, we executed 'drop
>index concurrently' on the master so that get ‘error’ in the standby.
>Although it is not 100%, it will appear. no lock in the drop index
>concurrently ::: I think this is because there are not enough advanced
>locks when executing ‘ drop index concurrently’.
>

OK, thanks for the clarification. Yes, it won't appear every time, it's
likely timing-sensitive (I'll explain in a minute).

>>Hmm, so you observe the issue with regular queries, not just EXPLAIN
>>ANALYZE?
>
>yeah, we have seen this error frequently.
>

That suggests you're doing a lot of 'drop index concurrently', right?

>>>Of course, we considered applying the method of waiting to detect the
>>>query lock on the master to the standby, but worried about affecting
>>>the standby application log delay, so we gave up that.
>>>
>>I don't understand? What method?
>>
>
>I analyzed this problem, I used to find out the cause of this problem,
>I also executed 'drop index concurrently' and ‘explain select * from
>xxx’ on the master, but the bug did not appear as expected. So I went
>to analyze the source code. I found that there is such a mechanism on
>the master that when the 'drop index concurrently' is execute, it wait
>will every transaction that saw the old index state has finished.
>source code is as follows follow as:
>
>WaitForLockers(heaplocktag, AccessExclusiveLock);
>
>Therefore, I think that if this method is also available in standby,
>then the error will not appear. but I worried about affecting the
>standby application log delay, so we gave up that.
>

Yes, but we can't really do that, I'm afraid.

We certainly can't do that on the master because we simply don't have
the necessary information about locks from the standby, and we really
don't want to have it, because with a busy standby that might be quite a
bit of data (plust the standby would have to wait for the master to
confirm each lock acquisition, I think which seems pretty terrible).

On the standby, we don't really have an idea that the there's a drop
index running - we only get information about AE locks, and a bunch of
catalog updates. I don't think we have a way to determine this is a drop
index in concurrent mode :-(

More preciresly, the master sends information about AccessExclusiveLock
in XLOG_STANDBY_LOCK wal record (in xl_standby_locks struct). And when
the standby replays that, it should acquire those locks.

For regular DROP INDEX we send this:

rmgr: Standby ... desc: LOCK xid 8888 db 16384 rel 16385
rmgr: Standby ... desc: LOCK xid 8888 db 16384 rel 20573
... catalog changes ...
rmgr: Transaction ... desc: COMMIT 2019-10-23 22:42:27.950995 CEST;
rels: base/16384/20573; inval msgs: catcache 32 catcache 7 catcache
6 catcache 50 catcache 49 relcache 20573 relcache 16385 snapshot 2608

while for DROP IDNEX CONCURRENTLY we send this

rmgr: Heap ... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
relcache 21288 relcache 16385
rmgr: Heap ... desc: INPLACE ... catalog update
rmgr: Standby ... desc: INVALIDATIONS ; inval msgs: catcache 32
relcache 21288 relcache 16385
rmgr: Standby ... desc: LOCK xid 10326 db 16384 rel 21288
... catalog updates ...
rmgr: Transaction ... desc: COMMIT 2019-10-23 23:47:10.042568 CEST;
rels: base/16384/21288; inval msgs: catcache 32 catcache 7 catcache 6
catcache 50 catcache 49 relcache 21288 relcache 16385 snapshot 2608

So just a single lock on the index, but not the lock on the relation
itself (which makes sense, because for DROP INDEX CONCURRENTLY we don't
get an exclusive lock on the table).

I'm not quite familiar with this part of the code, but the SELECT
backends are clearly getting a stale list of indexes from relcache, and
try to open an index which was already removed by the redo. We do
acquire the lock on the index itself, but that's not sufficient :-(

Not sure how to fix this. I wonder if we could invalidate the relcache
for the relation at some point. Or maybe we could add additional
information to the WAL to make the redo wait for all lock waiters, just
like on the master. But that might be tricky because of deadlocks, and
because the redo could easily get "stuck" waiting for a long-running
queries.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-10-24 04:14:00 Re: Fix of fake unlogged LSN initialization
Previous Message Justin Pryzby 2019-10-24 03:08:21 Re: v12.0: reindex CONCURRENTLY: lock ShareUpdateExclusiveLock on object 14185/39327/0 is already held