Re: Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Cc: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Re: Parallel Hash take II
Date: 2017-10-25 06:03:33
Message-ID: CAEepm=0th8Le2SDCv32zN7tMyCJYR9oGYJ52fXNYJz1hrpGW+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2017 at 10:10 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Here is an updated patch set that does that ^.

It's a bit hard to understand what's going on with the v21 patch set I
posted yesterday because EXPLAIN ANALYZE doesn't tell you anything
interesting. Also, if you apply the multiplex_gather patch[1] I
posted recently and set multiplex_gather to off then it doesn't tell
you anything at all, because the leader has no hash table (I suppose
that could happen with unpatched master given sufficiently bad
timing). Here's a new version with an extra patch that adds some
basic information about load balancing to EXPLAIN ANALYZE, inspired by
what commit bf11e7ee did for Sort.

Example output:

enable_parallel_hash = on, multiplex_gather = on:

-> Parallel Hash (actual rows=1000000 loops=3)
Buckets: 131072 Batches: 16
Leader: Shared Memory Usage: 3552kB Hashed: 396120 Batches Probed: 7
Worker 0: Shared Memory Usage: 3552kB Hashed: 276640 Batches Probed: 6
Worker 1: Shared Memory Usage: 3552kB Hashed: 327240 Batches Probed: 6
-> Parallel Seq Scan on simple s (actual rows=333333 loops=3)

-> Parallel Hash (actual rows=10000000 loops=8)
Buckets: 131072 Batches: 256
Leader: Shared Memory Usage: 2688kB Hashed: 1347720
Batches Probed: 36
Worker 0: Shared Memory Usage: 2688kB Hashed: 1131360
Batches Probed: 33
Worker 1: Shared Memory Usage: 2688kB Hashed: 1123560
Batches Probed: 38
Worker 2: Shared Memory Usage: 2688kB Hashed: 1231920
Batches Probed: 38
Worker 3: Shared Memory Usage: 2688kB Hashed: 1272720
Batches Probed: 34
Worker 4: Shared Memory Usage: 2688kB Hashed: 1234800
Batches Probed: 33
Worker 5: Shared Memory Usage: 2688kB Hashed: 1294680
Batches Probed: 37
Worker 6: Shared Memory Usage: 2688kB Hashed: 1363240
Batches Probed: 35
-> Parallel Seq Scan on big s2 (actual rows=1250000 loops=8)

enable_parallel_hash = on, multiplex_gather = off (ie no leader participation):

-> Parallel Hash (actual rows=1000000 loops=2)
Buckets: 131072 Batches: 16
Worker 0: Shared Memory Usage: 3520kB Hashed: 475920 Batches Probed: 9
Worker 1: Shared Memory Usage: 3520kB Hashed: 524080 Batches Probed: 8
-> Parallel Seq Scan on simple s (actual rows=500000 loops=2)

enable_parallel_hash = off, multiplex_gather = on:

-> Hash (actual rows=1000000 loops=3)
Buckets: 131072 Batches: 16
Leader: Memory Usage: 3227kB
Worker 0: Memory Usage: 3227kB
Worker 1: Memory Usage: 3227kB
-> Seq Scan on simple s (actual rows=1000000 loops=3)

enable_parallel_hash = off, multiplex_gather = off:

-> Hash (actual rows=1000000 loops=2)
Buckets: 131072 Batches: 16
Worker 0: Memory Usage: 3227kB
Worker 1: Memory Usage: 3227kB
-> Seq Scan on simple s (actual rows=1000000 loops=2)

parallelism disabled (traditional single-line output, unchanged):

-> Hash (actual rows=1000000 loops=1)
Buckets: 131072 Batches: 16 Memory Usage: 3227kB
-> Seq Scan on simple s (actual rows=1000000 loops=1)

(It actually says "Tuples Hashed", not "Hashed" but I edited the above
to fit on a standard punchcard.) Thoughts?

[1] https://www.postgresql.org/message-id/CAEepm%3D2U%2B%2BLp3bNTv2Bv_kkr5NE2pOyHhxU%3DG0YTa4ZhSYhHiw%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
parallel-hash-v22.patchset.tgz application/x-gzip 64.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2017-10-25 06:05:28 Re: Implementing pg_receivewal --no-sync
Previous Message Andres Freund 2017-10-25 05:53:49 Re: Current int & float overflow checking is slow.