Parallelize correlated subqueries that execute within each worker

From: James Coleman <jtc331(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Parallelize correlated subqueries that execute within each worker
Date: 2021-05-07 16:30:02
Message-ID: CAAaqYe_x1u3V4uPiv=dJ=k2EJ7txhdq6yexJDkYZ1x1pu0QwcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In a bug report back in November [1] a subthread explored why parallel
query is excluded any time we have "Plan nodes which reference a
correlated SubPlan". Amit's understanding was that the reasoning had
to do with inability to easily pass (potentially variable length)
Param values between workers.

However a decent-sized subset of this kind of query doesn't actually
require that we communicate between workers. If the Subplan executes
per-tuple within the worker then there's no reason I can see why it
needs to be marked parallel unsafe. Amit concurred but noted that
identifying that subset of plans is the difficult part (as is usually
the case!)

At the time I'd started work on an approach to handle this case and
hoped to "post about it in a new thread later this week." That didn't
happen, but here we are now, and I finally have this patch cleaned up
enough to share.

The basic idea is that we need to track (both on nodes and relations)
not only whether that node or rel is parallel safe but also whether
it's parallel safe assuming params are rechecked in the using context.
That allows us to delay making a final decision until we have
sufficient context to conclude that a given usage of a Param is
actually parallel safe or unsafe.

The first patch in this series was previously posted in the thread
"Consider parallel for lateral subqueries with limit" [2] and is
required as a precursor for various test cases to work here.

The second patch implements the core of the series. It results in
parallel query being possible for subplans that execute entirely
within the context of a parallel worker for cases where that subplan
is in the target, a LATERAL JOIN, or the WHERE and ORDER BY clauses.

The final patch notes several places where we set e.g.
rel->consider_parallel but setting the corresponding new value
rel->consider_parallel_recheckng_params wasn't yet necessary. It shows
opportunity either for further improvement or concluding certain cases
can't benefit and should be left unchanged.

James

1: https://www.postgresql.org/message-id/CAAaqYe_vihKjc%2B8LuQa49EHW4%2BKfefb3wHqPYFnCuUqozo%2BLFg%40mail.gmail.com
2: https://www.postgresql.org/message-id/flat/CAAaqYe_HEkmLwf_1iEHxXwQOWiRyiFd%3DuOu6kwj3sWYdVd1-zA%40mail.gmail.com

Attachment Content-Type Size
v1-0003-Other-places-to-consider-for-completeness.patch application/octet-stream 5.1 KB
v1-0002-Parallel-query-support-for-basic-correlated-subqu.patch application/octet-stream 75.4 KB
v1-0001-Allow-parallel-LATERAL-subqueries-with-LIMIT-OFFS.patch application/octet-stream 3.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2021-05-07 16:43:19 [PATCH] Add configure cache checkpoints before fatal checks
Previous Message Tom Lane 2021-05-07 16:22:51 Draft back-branch release notes are up