parallel append vs. simple UNION ALL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: parallel append vs. simple UNION ALL
Date: 2017-12-23 21:53:55
Message-ID: CA+Tgmoa6L9A1nNCk3aTDVZLZ4KkHDn1+tm7mFyFvP+uQPS7bAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As I mentioned in the commit message for the Parallel Append commit
(ab72716778128fb63d54ac256adf7fe6820a1185), it's kind of sad that this
doesn't work with UNION ALL queries, which are an obvious candidate
for such parallelization. It turns out that it actually does work to
a limited degree: assuming that the UNION ALL query can be converted
to a simple appendrel, it can consider a parallel append of
non-partial paths only. The attached patch lets it consider a
parallel append of partial paths by doing the following things:

1. Teaching set_subquery_pathlist to create *partial* SubqueryScan
paths as well as non-partial ones.
2. Teaching grouping_planner to create partial paths for the final rel
if not at the outermost query level.
3. Modifying finalize_plan to allow the gather_param to be passed
across subquery boundaries.

#3 is the only part I'm really unsure about; the other stuff looks
pretty cut and dried.

I have a draft patch that handles the case where the union can't be
converted to a simple appendrel, too, but that's not quite baked
enough to post yet.

For those for whom the above may be too technical to follow, here's an example:

pgbench -i 40
explain (costs off) select a.bid from pgbench_accounts a,
pgbench_branches b where a.bid = b.bid and aid % 1000 = 0 union all
select a.bid from pgbench_accounts a where aid % 1000 = 0;

Unpatched:

Append
-> Gather
Workers Planned: 2
-> Hash Join
Hash Cond: (a.bid = b.bid)
-> Parallel Seq Scan on pgbench_accounts a
Filter: ((aid % 1000) = 0)
-> Hash
-> Seq Scan on pgbench_branches b
-> Gather
Workers Planned: 2
-> Parallel Seq Scan on pgbench_accounts a_1
Filter: ((aid % 1000) = 0)

Patched:

Gather
Workers Planned: 2
-> Parallel Append
-> Hash Join
Hash Cond: (a.bid = b.bid)
-> Parallel Seq Scan on pgbench_accounts a
Filter: ((aid % 1000) = 0)
-> Hash
-> Seq Scan on pgbench_branches b
-> Parallel Seq Scan on pgbench_accounts a_1
Filter: ((aid % 1000) = 0)

In this particular case the change doesn't buy very much, but the
second plan is better because avoid shutting down one set of workers
and starting a new set. That's more efficient, plus it allows the two
branches to be worked in parallel rather than serially. On a small
enough scale factor, even without the patch, you get this...

Gather
Workers Planned: 2
-> Parallel Append
-> Nested Loop
Join Filter: (a.bid = b.bid)
-> Seq Scan on pgbench_branches b
-> Seq Scan on pgbench_accounts a
Filter: ((aid % 1000) = 0)
-> Seq Scan on pgbench_accounts a_1
Filter: ((aid % 1000) = 0)

...but that's not good because now we have regular sequential scans
instead of partial sequential scans.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
subquery-smarts.patch application/octet-stream 2.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-12-23 22:23:57 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Previous Message konstantin knizhnik 2017-12-23 20:53:19 Re: AS OF queries