Re: Reduce ProcArrayLock contention

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce ProcArrayLock contention
Date: 2015-08-25 11:51:31
Message-ID: CAA4eK1KibL2kL_ygPDS7QBVc0J=UGaFXYajkqjRF+CkQu8Pbww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 20, 2015 at 3:49 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2015-08-20 15:38:36 +0530, Amit Kapila wrote:
> > On Wed, Aug 19, 2015 at 9:09 PM, Andres Freund <andres(at)anarazel(dot)de>
wrote:
> > > I spent some time today reviewing the commited patch. So far my only
> > > major complaint is that I think the comments are only insufficiently
> > > documenting the approach taken:
> > > Stuff like avoiding ABA type problems by clearling the list entirely
and
> > > it being impossible that entries end up on the list too early
absolutely
> > > needs to be documented explicitly.
> > >
> >
> > I think more comments can be added to explain such behaviour if it is
> > not clear via looking at current code and comments.
>
> It's not mentioned at all, so yes.
>

I have updated comments in patch as suggested by you. We can even add
a link of wiki or some other page which can explain the definition of ABA
problem or we can even explain what the problem is, but as this is a
well-known problem that can occur while implementing lock-free
data- structure, so not adding any explanation also seems okay.

> > I think you are right and here we need to use something like what is
> > suggested below by you. Originally the code was similar to what you
> > have written below, but it was using a different (new) variable to
achieve
> > what you have achieved with lwWaiting and to avoid the use of new
> > variable the code has been refactored in current way. I think we should
> > do this change (I can write a patch) unless Robert feels otherwise.
>
> I think we can just rename lwWaiting to something more generic.
>

I think that can create problem considering we have to set it in
ProcArrayGroupClearXid() before adding the proc to wait list (which means
it will be set for leader as well and that can create a problem, because
leader
needs to acquire LWLock and in LWLock code, lwWaiting is used).

The problem I see with setting lwWaiting after adding it to list is that
leader
might have already cleared the proc by the time we try to set lwWaiting
for a follower.

For now, I have added a separate variable.

>
> > > Consider what happens if such a follower enqueues in another
> > > transaction. It is not, as far as I could find out, guaranteed on
all
> > > types of cpus that a third backend can actually see
nextClearXidElem
> > > as INVALID_PGPROCNO. That'd likely require SMT/HT cores and
multiple
> > > sockets. If the write to nextClearXidElem is entered into the local
> > > store buffer (leader #1) a hyper-threaded process (leader #2) can
> > > possibly see it (store forwarding) while another backend doesn't
> > > yet.
> > >
> > > I think this is very unlikely to be an actual problem due to
> > > independent barriers until enqueued again, but I don't want to rely
> > > on it undocumentedly. It seems safer to replace
> > > + wakeidx = pg_atomic_read_u32(&proc->nextClearXidElem);
> > > + pg_atomic_write_u32(&proc->nextClearXidElem,
> > INVALID_PGPROCNO);
> > > with a pg_atomic_exchange_u32().
> > >
> >
> > I didn't follow this point, if we ensure that follower can never return
> > before leader wakes it up, then why it will be a problem to update
> > nextClearXidElem like above.
>
> Because it doesn't generally enforce that *other* backends have seen the
> write as there's no memory barrier.
>

After changing the code by have separate variable to indicate that the xid
is cleared and changed logic (by having barriers), I don't think this
problem
can occur, can you please see the latest attached patch and let me know
if you still see this problem.

>
>
> > > I don't think it's a good idea to use the variable name in PROC_HDR
and
> > > PGPROC, it's confusing.
>
> > What do you mean by this, are you not happy with variable name?
>
> Yes. I think it's a bad idea to have the same variable name in PROC_HDR
> and PGPROC.
>
> struct PGPROC
> {
> ...
> /* Support for group XID clearing. */
> volatile pg_atomic_uint32 nextClearXidElem;
> ...
> }
>
> typedef struct PROC_HDR
> {
> ...
> /* First pgproc waiting for group XID clear */
> volatile pg_atomic_uint32 nextClearXidElem;
> ...
> }
>
> PROC_HDR's variable imo isn't well named.
>

Changed the variable name in PROC_HDR.

> How hard did you try checking whether this causes regressions? This
> increases the number of atomics in the commit path a fair bit. I doubt
> it's really bad, but it seems like a good idea to benchmark something
> like a single full-throttle writer and a large number of readers.

One way to test this is run pgbench read load (with 100 client count) and
write load (tpc-b - with one client) simultaneously and check the results.
I have tried this and there is lot of variation(more than 50%) in tps in
different runs of write load, so not sure if this is the right way to
benchmark it.

Another possible way is to hack pgbench code and make one thread run
write transaction and others run read transactions.

Do you have any other ideas or any previously written test (which you are
aware of) with which this can be tested?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
fix_procarraylock_issues_v1.patch application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-08-25 12:26:28 Re: Commitfest remaining "Needs Review" items
Previous Message Christoph Berg 2015-08-25 10:16:49 9.4 broken on alpha