Re: Reduce ProcArrayLock contention

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce ProcArrayLock contention
Date: 2015-08-20 10:19:40
Message-ID: 20150820101940.GA5070@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
> On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I spent some time today reviewing the commited patch. So far my only
> > major complaint is that I think the comments are only insufficiently
> > documenting the approach taken:
> > Stuff like avoiding ABA type problems by clearling the list entirely and
> > it being impossible that entries end up on the list too early absolutely
> > needs to be documented explicitly.
> >
>
> I think more comments can be added to explain such behaviour if it is
> not clear via looking at current code and comments.

It's not mentioned at all, so yes.

> I think you are right and here we need to use something like what is
> suggested below by you. Originally the code was similar to what you
> have written below, but it was using a different (new) variable to achieve
> what you have achieved with lwWaiting and to avoid the use of new
> variable the code has been refactored in current way. I think we should
> do this change (I can write a patch) unless Robert feels otherwise.

I think we can just rename lwWaiting to something more generic.

> > Consider what happens if such a follower enqueues in another
> > transaction. It is not, as far as I could find out, guaranteed on all
> > types of cpus that a third backend can actually see nextClearXidElem
> > as INVALID_PGPROCNO. That'd likely require SMT/HT cores and multiple
> > sockets. If the write to nextClearXidElem is entered into the local
> > store buffer (leader #1) a hyper-threaded process (leader #2) can
> > possibly see it (store forwarding) while another backend doesn't
> > yet.
> >
> > I think this is very unlikely to be an actual problem due to
> > independent barriers until enqueued again, but I don't want to rely
> > on it undocumentedly. It seems safer to replace
> > + wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> > + pg_atomic_write_u32(&proc->nextClearXidElem,
> INVALID_PGPROCNO);
> > with a pg_atomic_exchange_u32().
> >
>
> I didn't follow this point, if we ensure that follower can never return
> before leader wakes it up, then why it will be a problem to update
> nextClearXidElem like above.

Because it doesn't generally enforce that *other* backends have seen the
write as there's no memory barrier.

> > +/*
> > + * ProcArrayGroupClearXid -- group XID clearing
> > + *
> > + * When we cannot immediately acquire ProcArrayLock in exclusive mode at
> > + * commit time, add ourselves to a list of processes that need their XIDs
> > + * cleared. The first process to add itself to the list will acquire
> > + * ProcArrayLock in exclusive mode and perform
> ProcArrayEndTransactionInternal
> > + * on behalf of all group members. This avoids a great deal of context
> > + * switching when many processes are trying to commit at once, since the
> lock
> > + * only needs to be handed from the last share-locker to one process
> waiting
> > + * for the exclusive lock, rather than to each one in turn.
> > + */
> > +static void
> > +ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid)
> > +{
> >
> > This comment, in my opinion, is rather misleading. If you have a
> > workload that primarily is slowed down due to transaction commits, this
> > patch doesn't actually change the number of context switches very
> > much. Previously all backends enqueued in the lwlock and got woken up
> > one-by-one. Safe backends 'jumping' the queue while a lock has been
> > released but the woken up backend doesn't yet run, there'll be exactly
> > as many context switches as today.
> >
>
> I think this is debatable as in previous mechanism all the backends
> one-by-one clears their transaction id's (which is nothing but context
> switching) which lead to contention not only with read lockers
> but Write lockers as well.

Huh? You can benchmark it, there's barely any change in the number of
context switches.

I am *not* saying that there's no benefit to the patch, I am saying that
context switches are the wrong explanation. Reduced contention (due to
shorter lock holding times, fewer cacheline moves etc.) is the reason
this is beneficial.

> > I don't think it's a good idea to use the variable name in PROC_HDR and
> > PGPROC, it's confusing.

> What do you mean by this, are you not happy with variable name?

Yes. I think it's a bad idea to have the same variable name in PROC_HDR
and PGPROC.

struct PGPROC
{
...
/* Support for group XID clearing. */
volatile pg_atomic_uint32 nextClearXidElem;
...
}

typedef struct PROC_HDR
{
...
/* First pgproc waiting for group XID clear */
volatile pg_atomic_uint32 nextClearXidElem;
...
}

PROC_HDR's variable imo isn't well named.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2015-08-20 10:53:43 Re: Supporting fallback RADIUS server(s)
Previous Message Amit Kapila 2015-08-20 10:11:38 Re: Reduce ProcArrayLock contention