Re: Fix performance degradation of contended LWLock on NUMA

From: Andres Freund <andres(at)anarazel(dot)de>
To: Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>
Cc: jesper(dot)pedersen(at)redhat(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fix performance degradation of contended LWLock on NUMA
Date: 2017-10-19 16:46:36
Message-ID: 20171019164636.zfjm242caowascem@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-10-19 14:36:56 +0300, Sokolov Yura wrote:
> > > + init_local_spin_delay(&delayStatus);
> >
> > The way you moved this around has the disadvantage that we now do this -
> > a number of writes - even in the very common case where the lwlock can
> > be acquired directly.
>
> Excuse me, I don't understand fine.
> Do you complain against init_local_spin_delay placed here?

Yes.

> Placing it in other place will complicate code.

> Or you complain against setting `mask` and `add`?

That seems right.

> In both cases, I think simpler version should be accepted first. It acts
> as algorithm definition. And it already gives measurable improvement.

Well, in scalability. I'm less sure about uncontended performance.

> > > + * We intentionally do not call finish_spin_delay here, because
> > > the loop
> > > + * above usually finished by queuing into the wait list on
> > > contention, and
> > > + * doesn't reach spins_per_delay thereby doesn't sleep inside of
> > > + * perform_spin_delay. Also, different LWLocks has very different
> > > + * contention pattern, and it is wrong to update spin-lock
> > > statistic based
> > > + * on LWLock contention.
> > > + */
> >
> > Huh? This seems entirely unconvincing. Without adjusting this here we'll
> > just spin the same way every iteration. Especially for the case where
> > somebody else holds LW_FLAG_LOCKED that's quite bad.
>
> LWLock's are very different. Some of them are always short-term
> (BufferLock), others are always locked for a long time.

That seems not particularly relevant. The same is true for
spinlocks. The relevant question isn't how long the lwlock is held, it's
how long LW_FLAG_LOCKED is held - and that should only depend on
contention (i.e. bus speed, amount of times put into sleep while holding
lock, etc), not on how long the lock is held.

> I've tried to place this delay into lock itself (it has 2 free bytes),
> but this attempt performed worse.

That seems unsurprising - there's a *lot* of locks, and we'd have to
tune all of them. Additionally there's a bunch of platforms where we do
*not* have free bytes (consider the embedding in BufferTag).

> Now I understand, that delays should be stored in array indexed by
> tranche. But I have no time to test this idea. And I doubt it will give
> cardinally better results (ie > 5%), so I think it is better to accept
> patch in this way, and then experiment with per-tranche delay.

I don't think tranches have any decent predictive power here.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nico Williams 2017-10-19 18:27:53 Re: Interest in a SECURITY DEFINER function current_user stack access mechanism?
Previous Message Peter Geoghegan 2017-10-19 16:03:47 Re: [COMMITTERS] pgsql: Fix traversal of half-frozen update chains