Parallel Hash take II

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Oleg Golovanov <rentech(at)mail(dot)ru>
Subject: Parallel Hash take II
Date: 2017-07-26 08:12:56
Message-ID: CAEepm=37HKyJ4U6XOLi=JgfSHM3o6B-GaeO-6hkOmneTDkH+Uw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Here is a new version of my parallel-aware hash join patchset. I've
dropped 'shared' from the feature name and EXPLAIN output since that's
now implied by the word "Parallel" (that only made sense in earlier
versions that had Shared Hash and Parallel Shared Hash, but a Shared
Hash with just one participant building it didn't turn out to be very
useful so I dropped it a few versions ago). I figured for this new
round I should create a new thread, but took the liberty of copying
the CC list from the previous one[1].

The main changes are:

1. Implemented the skew optimisation for parallel-aware mode. The
general approach is the same as the regular hash table: insert with a
CAS loop. The details of memory budget management are different
though. It grants chunks of budget to participants as needed even
though allocation is still per-tuple, and it has to deal with
concurrent bucket removal. I removed one level of indirection from
the skew hash table: in this version hashtable->skewBucket is an array
of HashSkewBucket instead of pointers to HashSkewBuckets allocated
separately. That makes the hash table array twice as big but avoids
one pointer hop when probing an active bucket; that refactoring was
not strictly necessary but made the changes to support parallel build
simpler.

2. Simplified costing. There is now just one control knob
"parallel_synchronization_cost", which I charge for each time the
participants will wait for each other at a barrier, to be set high
enough to dissuade the planner from using Parallel Hash for tiny hash
tables that would be faster in a parallel-oblivious hash join.
Earlier ideas about modelling the cost of shared memory access didn't
work out.

Status: I think there are probably some thinkos in the new skew
stuff. I think I need some new ideas about how to refactor things so
that there isn't quite so much "if-shared-then-this-else-that". I
think I should build some kind of test mode to control barriers so
that I can test the permutations of participant arrival phase
exhaustively. I need to propose an empirically derived default for
the GUC. There are several other details I would like to tidy up and
improve. That said, I wanted to post what I have as a checkpoint now
that I have the major remaining piece (skew optimisation) more-or-less
working and the costing at a place that I think make sense.

I attach some queries to exercise various interesting cases. I would
like to get something like these into fast-running regression test
format.

Note that this patch requires the shared record typmod patch[2] in
theory, since shared hash table tuples might reference bless record
types, but there is no API dependency so you can use this patch set
without applying that one. If anyone knows how to actually provoke a
parallel hash join that puts RECORD types into the hash table, I'd be
very interested to hear about it, but certainly for TPC and similar
testing that other patch set is not necessary.

Of the TPC-H queries, I find that Q3, Q5, Q7, Q8, Q9, Q10, Q12, Q14,
Q16, Q18, Q20 and Q21 make use of Parallel Hash nodes (I tested with
neqjoinsel-fix-v3.patch[3] also applied, which avoids some but not all
craziness in Q21). For examples that also include a
parallel-oblivious Hash see Q8 and Q10: in those queries you can see
the planner deciding that it's not worth paying
parallel_synchronization_cost = 10 to load the 25 row "nation" table.
I'll report on performance separately.

[1] https://www.postgresql.org/message-id/flat/CAEepm=2W=cOkiZxcg6qiFQP-dHUe09aqTrEMM7yJDrHMhDv_RA(at)mail(dot)gmail(dot)com
[2] https://www.postgresql.org/message-id/CAEepm=0ZtQ-SpsgCyzzYpsXS6e=kZWqk3g5Ygn3MDV7A8dabUA@mail.gmail.com
[3] https://www.postgresql.org/message-id/CAEepm%3D3%3DNHHko3oOzpik%2BggLy17AO%2Bpx3rGYrg3x_x05%2BBr9-A%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
parallel-hash-v16.patchset.tgz application/x-gzip 68.7 KB
hj-test-queries.sql application/octet-stream 4.7 KB
hj-skew.sql application/octet-stream 1.0 KB
hj-skew-unmatched.sql application/octet-stream 987 bytes
hj-skew-overflow.sql application/octet-stream 593 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2017-07-26 08:31:37 Re: proposal: psql: check env variable PSQL_PAGER
Previous Message Amit Langote 2017-07-26 07:59:46 Re: UPDATE of partition key