Re: Parallel Hash take II

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-10-26 11:24:20
Message-ID: CAGPqQf3GyCnDbDFRs4Le4e=dt4drNbh=SDXTkZsMe0nDXAAg2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

While re-basing the parallel-B-tree-index-build patch on top v22 patch
sets, found cosmetic review:

1) BufFileSetEstimate is removed but it's still into buffile.h

+extern size_t BufFileSetEstimate(int stripes);

2) BufFileSetCreate is renamed with BufFileSetInit, but used at below
place in comments:

* Attach to a set of named BufFiles that was created with BufFileSetCreate.

Thanks,

On Wed, Oct 25, 2017 at 11:33 AM, Thomas Munro <
thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> On Tue, Oct 24, 2017 at 10:10 PM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> > Here is an updated patch set that does that ^.
>
> It's a bit hard to understand what's going on with the v21 patch set I
> posted yesterday because EXPLAIN ANALYZE doesn't tell you anything
> interesting. Also, if you apply the multiplex_gather patch[1] I
> posted recently and set multiplex_gather to off then it doesn't tell
> you anything at all, because the leader has no hash table (I suppose
> that could happen with unpatched master given sufficiently bad
> timing). Here's a new version with an extra patch that adds some
> basic information about load balancing to EXPLAIN ANALYZE, inspired by
> what commit bf11e7ee did for Sort.
>
> Example output:
>
> enable_parallel_hash = on, multiplex_gather = on:
>
> -> Parallel Hash (actual rows=1000000 loops=3)
> Buckets: 131072 Batches: 16
> Leader: Shared Memory Usage: 3552kB Hashed: 396120 Batches
> Probed: 7
> Worker 0: Shared Memory Usage: 3552kB Hashed: 276640 Batches
> Probed: 6
> Worker 1: Shared Memory Usage: 3552kB Hashed: 327240 Batches
> Probed: 6
> -> Parallel Seq Scan on simple s (actual rows=333333 loops=3)
>
> -> Parallel Hash (actual rows=10000000 loops=8)
> Buckets: 131072 Batches: 256
> Leader: Shared Memory Usage: 2688kB Hashed: 1347720
> Batches Probed: 36
> Worker 0: Shared Memory Usage: 2688kB Hashed: 1131360
> Batches Probed: 33
> Worker 1: Shared Memory Usage: 2688kB Hashed: 1123560
> Batches Probed: 38
> Worker 2: Shared Memory Usage: 2688kB Hashed: 1231920
> Batches Probed: 38
> Worker 3: Shared Memory Usage: 2688kB Hashed: 1272720
> Batches Probed: 34
> Worker 4: Shared Memory Usage: 2688kB Hashed: 1234800
> Batches Probed: 33
> Worker 5: Shared Memory Usage: 2688kB Hashed: 1294680
> Batches Probed: 37
> Worker 6: Shared Memory Usage: 2688kB Hashed: 1363240
> Batches Probed: 35
> -> Parallel Seq Scan on big s2 (actual rows=1250000 loops=8)
>
> enable_parallel_hash = on, multiplex_gather = off (ie no leader
> participation):
>
> -> Parallel Hash (actual rows=1000000 loops=2)
> Buckets: 131072 Batches: 16
> Worker 0: Shared Memory Usage: 3520kB Hashed: 475920 Batches
> Probed: 9
> Worker 1: Shared Memory Usage: 3520kB Hashed: 524080 Batches
> Probed: 8
> -> Parallel Seq Scan on simple s (actual rows=500000 loops=2)
>
> enable_parallel_hash = off, multiplex_gather = on:
>
> -> Hash (actual rows=1000000 loops=3)
> Buckets: 131072 Batches: 16
> Leader: Memory Usage: 3227kB
> Worker 0: Memory Usage: 3227kB
> Worker 1: Memory Usage: 3227kB
> -> Seq Scan on simple s (actual rows=1000000 loops=3)
>
> enable_parallel_hash = off, multiplex_gather = off:
>
> -> Hash (actual rows=1000000 loops=2)
> Buckets: 131072 Batches: 16
> Worker 0: Memory Usage: 3227kB
> Worker 1: Memory Usage: 3227kB
> -> Seq Scan on simple s (actual rows=1000000 loops=2)
>
> parallelism disabled (traditional single-line output, unchanged):
>
> -> Hash (actual rows=1000000 loops=1)
> Buckets: 131072 Batches: 16 Memory Usage: 3227kB
> -> Seq Scan on simple s (actual rows=1000000 loops=1)
>
> (It actually says "Tuples Hashed", not "Hashed" but I edited the above
> to fit on a standard punchcard.) Thoughts?
>
> [1] https://www.postgresql.org/message-id/CAEepm%3D2U%2B%
> 2BLp3bNTv2Bv_kkr5NE2pOyHhxU%3DG0YTa4ZhSYhHiw%40mail.gmail.com
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

--
Rushabh Lathia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-10-26 11:34:02 Re: path toward faster partition pruning
Previous Message Rushabh Lathia 2017-10-26 11:22:16 Re: Parallel tuplesort (for parallel B-Tree index creation)