Re: Assertion failure on hot standby

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Assertion failure on hot standby
Date: 2010-11-27 00:02:27
Message-ID: AANLkTi=ytwfhJg2yXobstkn4PQiDQ=xCr4fs6vPz9LWo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 26, 2010 at 6:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Nov 26, 2010 at 2:06 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> If you go down that path, you're going to spend a lot of time thinking
>>> through every single case that uses an AccessExclusiveLock, ensuring that
>>> the standby has enough information, and tinkering with the replay code to
>>> acquire locks at the right moment. And gain what, exactly?
>
>> Well, fewer useless locks on the standby, for one thing, in all
>> likelihood, and less WAL traffic.
>
> I think it's not only useless from a performance standpoint, but
> probably actually dangerous, to not take AccessExclusiveLock on the
> standby when it's taken on the master.  If you try to delay taking the
> lock, then locks will be taken in a different order on master and
> standby, which is quite likely to lead to deadlock situations.

As far as I can see, that's complete nonsense. Deadlocks between what
and what? To get a deadlock on the standby, you have to have a cycle
in the lock-wait graph; and since recovery is single-threaded, all
locks from the master are held by the startup process. The only
possible cycle is between the startup process and some HS backend; and
why should we assume that the HS backend will acquire locks in the
same order as the transactions running on the master rather than any
other order? Perhaps that could be accomplished by very careful
application design, but in normal usage it doesn't sound very likely.

In fact, now that I think about it, what I'm proposing would actually
substantially REDUCE the risk of deadlock on the standby, because the
master would only ever need to lock a backing file long enough to drop
or truncate it, whereas under the present system the startup process
might need to hold many locks at once. It'd help reduce the chance of
lock table overflow, too.

> Speaking of which, is there any code in there to ensure that a deadlock
> in the standby is resolved by killing HS queries and not the replay
> process?  Because deadlocks are certainly going to be possible no matter
> what.

I believe the place to look is ResolveRecoveryConflictWithLock().

>> All somebody has to do is introduce a
>> mechanism that drops or rewrites a relation file without an access
>> exclusive lock, and this whole approach snaps right off
>
> ... as would queries on the master, so that's not ever happening.

There might be a problem with what I wrote in the part you didn't
quote here, but if there is, an explanation would be a lot more
helpful than a categorical statement unsupported by any argument.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-27 00:11:09 Re: [GENERAL] column-level update privs + lock table
Previous Message Tom Lane 2010-11-26 23:53:19 Re: duplicate connection failure messages