Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Дмитрий Дегтярёв <degtyaryov(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Date: 2013-11-21 15:08:05
Message-ID: CAHyXU0wvqrVeBuVh46PJrcBGRmzzLhsgUhFJw3_e9m_HUZ6y1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 21, 2013 at 9:02 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-11-21 16:25:02 +0200, Heikki Linnakangas wrote:
>> Hmm. All callers of RecoveryInProgress() must be prepared to handle the case
>> that RecoveryInProgress() returns true, but the system is no longer in
>> recovery. No matter what locking we do in RecoveryInProgress(), the startup
>> process might finish recovery just after RecoveryInProgress() has returned.
>
> True.
>
>> What about the attached? It reads the shared variable without a lock or
>> barrier. If it returns 'true', but the system in fact just exited recovery,
>> that's OK. As explained above, all the callers must tolerate that anyway.
>> But if it returns 'false', then it performs a full memory barrier, which
>> should ensure that it sees any other shared variables as it is after the
>> startup process cleared SharedRecoveryInProgress (notably,
>> XLogCtl->ThisTimeLineID).
>
> I'd argue that we should also remove the spinlock in StartupXLOG and
> replace it with a write barrier. Obviously not for performance reasons,
> but because somebody might add more code to run under that spinlock.
>
> Looks good otherwise, although a read memory barrier ought to suffice.

This code is in a very hot code path. Are we *sure* that the read
barrier is fast enough that we don't want to provide an alternate
function that only returns the local flag? I don't know enough about
them to say either way.

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-21 15:09:13 Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Previous Message Tom Lane 2013-11-21 15:07:53 Re: UNNEST with multiple args, and TABLE with multiple funcs