Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Sergey Burladyan <eshkinkot(at)gmail(dot)com>
Cc: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: BUG #13657: Some kind of undetected deadlock between query and "startup process" on replica.
Date: 2015-10-24 00:00:57
Message-ID: CAB7nPqSknHmw9EcW79PQWH9uoT8-eDoT4Z_mCqZvyroAgNZp2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Oct 23, 2015 at 11:36 PM, Sergey Burladyan <eshkinkot(at)gmail(dot)com> wrote:
> On Wed, Oct 21, 2015 at 5:42 AM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> Maxim, did you get the occasion to test your setup with 9.4.5? It
>> seems that your environment is more likely to reproduce this deadlock.
>
> I think I also have this issue, but with 9.2.13, today I see dozens of
> waiting requests at standby
> (via pgadmin Tools-Server Status). All of it is waiting for PID 48205,
> it is startup process:
> 48205 ? Ss 6289:29 postgres: startup process recovering
> 0000001000024C5E000000DF waiting
>
> and startup process hold AccessExclusiveLock for table:
> 'virtualxid' <NULL> <NULL> <NULL> <NULL> '1/1' <NULL> <NULL> <NULL>
> <NULL> '1/0' 48205 'ExclusiveLock' t t
> 'relation' 16444 16993 <NULL> <NULL> '<NULL>' <NULL> <NULL> <NULL>
> <NULL> '1/0' 48205 'AccessExclusiveLock' t f
>
> I have 'ALTER tblname RENAME xxx TO yyy' at master, before this lock.
> And I am using recovery command, without streaming replication.

Thanks! This blows away my previous assumption that this was limited
to 9.4~. If you still have the standby in a frozen state, where is the
startup process stucked? Could you get a backtrace from it? Perhaps it
is replaying a XLOG_HEAP2_CLEAN record. Did you run any queries on the
standby that took locks on the renamed relation 16993?

Also, segment 0000001000024C5E000000DF may be an important piece of
the puzzle as the records replayed may be in an unexpected order.
Could it be possible to get a dump of it using xlogdump (Postgres core
includes it from 9.3~) around the point where startup process has been
waiting?
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2015-10-24 03:10:36 Re: Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')
Previous Message Gowreswaran Sakthivel 2015-10-23 23:15:58 Reg Loging error mesages on ouput of sql script