Re: modeling parallel contention (was: Parallel Append implementation)

From: Andres Freund <andres(at)anarazel(dot)de>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: modeling parallel contention (was: Parallel Append implementation)
Date: 2017-05-05 02:36:46
Message-ID: 20170505023646.3uhnmf2hbwtm63lc@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-05-05 14:20:48 +1200, David Rowley wrote:
> Yeah, I did get some time to look over the contention in Parallel Seq
> Scan a while back and I discovered that on the machine that I was
> testing on. the lock obtained in heap_parallelscan_nextpage() was
> causing workers to have to wait for other workers to fetch their next
> task to work on.

Oh, if it's "just" that, it should be easy enough to address. Two
approaches:
1) use atomic ops for increment, modulo afterwards to deal with
wraparound in the synchronous scan
2) batching

> I ended up writing the attached (which I'd not intended to post until
> some time closer to when the doors open for PG11). At the moment it's
> basically just a test patch to see how it affects things when we give
> workers a bit more to do before they come back to look for more work.
> In this case, I've just given them 10 pages to work on, instead of the
> 1 that's allocated in 9.6 and v10.

Right.

> A quick test on a pretty large table on a large machine shows:
>
> Unpatched:
>
> postgres=# select count(*) from a;
> count
> ------------
> 1874000000
> (1 row)
>
> Time: 5211.485 ms (00:05.211)
>
> Patched:
>
> postgres=# select count(*) from a;
> count
> ------------
> 1874000000
> (1 row)
>
> Time: 2523.983 ms (00:02.524)

Neat!

> I'd had thoughts that the 10 pages wouldn't be constant, but the
> batching size would depend on the size of the relation to be scanned.
> I'd rough ideas to just try to make about 1 million batches. Something
> like batch_pages = Max(parallel_scan->phs_nblocks / 1000000, 1); so
> that we only take more than 1 page if there's some decent amount to
> process. We don't want to make the batches too big as we might end up
> having to wait on slow workers at the end of a scan.

I wonder how much doing the atomic ops approach alone can help, that
doesn't have the issue that the work might be unevenly distributed
between pages.

> Anyway. I don't want to hi-jack this thread with discussions on this.
> I just wanted to mark that I plan to work on this in order to avoid
> any repeat developments or analysis. I'll probably start a new thread
> for this sometime nearer PG11's dev cycle.

Cool. I think it might sense to post about this soon, just to give it
some more visibility to reduce the potential for duplication.

- andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2017-05-05 02:36:58 Re: modeling parallel contention (was: Parallel Append implementation)
Previous Message David Rowley 2017-05-05 02:23:15 Re: modeling parallel contention (was: Parallel Append implementation)