Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2019-12-30 03:34:02
Message-ID: CAAKRu_YsWm7gc_b2nBGWFPE6wuhdOLfc1LBZ786DUzaCPUDXCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

So, I finally have a prototype to share of parallel hashloop fallback.

See the commit message for a full description of the functionality of the
patch.

This patch does contain refactoring of nodeHashjoin.

I have split the Parallel HashJoin and Serial HashJoin state machines
up, as they were diverging in my patch to a point that made for a
really cluttered ExecHashJoinImpl() (ExecHashJoinImpl() is now gone).

The reason I didn't do this refactoring in one patch and then put the
adaptive hashjoin code on top of it is that I might like to make
Parallel HashJoin and Serial HashJoin different nodes.

I think that has been discussed elsewhere and was looking to
understand the rationale for keeping them in the same node.

The patch is a rough prototype. Below are some of the high-level
pieces of work that I plan to do next. (there are many TODOs in the
code as well).

Some of the major outstanding work:

- correctness:
- haven't tried it with anti-joins and don't think it works
- number of batches is not deterministic from run-to-run

- performance:
- join_hash.sql is *much* slower.
While there are loads of performance fixes needed in the patch,
the basic criteria for "falling back" is likely the culprit here.
- There are many bottlenecks (there are several places where a
barrier could be moved to somewhere less hot, an atomic used
instead of a lock, or a method of coordination could be used to
allow workers to do backend-local accounting and aggregate it)
- need to make sure it does not create outer match status files when
it shouldn't (inner joins, for example)

- testing:
- many unexercised cases
- add number of chunks to EXPLAIN (for users and for testing)

- refactoring:
- The match status bitmap should have its own API or, at least,
manipulation of it should be done in a centralized set of
functions
- Rename "chunk" (as in chunks of inner side) to something that is
not already used in the context of memory chunks and, more
importantly, SharedTuplestoreChunk
- Make references to "hashloop fallback" and "adaptive hashjoin"
more consistent
- Rename adaptiveHashjoin.h/.c files and change what is in the files
which are separate from nodeHashjoin.h/.c (depending on outcome of
"new node")
- The state machines are big and unwieldy now, so, there is probably
some larger restructuring that could be done
- Should probably use the ParallelHashJoinBatchAccessor to access
the ParallelHashJoinBatch everywhere (realized this recently)

--
Melanie Plageman

Attachment Content-Type Size
v3-0001-hashloop-fallback.patch application/octet-stream 147.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-12-30 05:10:39 Re: [HACKERS] Block level parallel vacuum
Previous Message Amit Kapila 2019-12-30 02:55:28 Re: [HACKERS] Block level parallel vacuum