Re: Parallel Append can break run-time partition pruning

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Append can break run-time partition pruning
Date: 2020-04-23 04:36:48
Message-ID: CAApHDvqv9s_6__JnoGru9Hq0yE_40eOzPGeHXDatr29qECE3yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 23 Apr 2020 at 14:37, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Apr 22, 2020 at 7:36 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > If there was some reason that a Parallel Append could come out more
> > expensive, then maybe we could just create a non-parallel Append using
> > the same subpath list and add_partial_path() it. I just don't quite
> > see how that would ever win though. I'm willing to be proven wrong
> > though.
>
> I think you're talking about the thing that this comment is trying to explain:
>
> /*
> * Consider a parallel-aware append using a mix of partial and non-partial
> * paths. (This only makes sense if there's at least one child which has
> * a non-partial path that is substantially cheaper than any partial path;
> * otherwise, we should use the append path added in the previous step.)
> */
>
> Like, suppose there are two relations A and B, and we're appending
> them. A has no indexes, so we can only choose between a Seq Scan and
> an Index Scan. B has a GIST index that is well-suited to the query,
> but GIST indexes don't support parallel scans. So we've got three
> choices:
>
> 1. Don't use parallelism at all. Then, we can do a normal Append with
> a Seq Scan on a and an Index Scan on b. ("If we found unparameterized
> paths for all children, build an unordered, unparameterized Append
> path for the rel.")
>
> 2. Use parallelism for both relations. Then, we can do a Gather over a
> Parallel Append (or a regular Append, if Parallel Append is disabled)
> with a Parallel Seq Scan on a and a Parallel Seq Scan on b. As
> compared with #1, this should win for a, but it might lose heavily for
> b, because switching from an index scan to a Seq Scan could be a big
> loser. ("Consider an append of unordered, unparameterized partial
> paths. Make it parallel-aware if possible.")
>
> 3. Use parallelism for a but not for b. The only way to do this is a
> Parallel Append, because there's no other way to mix partial and
> non-partial paths at present. This lets us get the benefit of a
> Parallel Seq Scan on a while still being able to do a non-parallel
> GIST Index Scan on b. This has a good chance of being better than #2,
> but it's fundamentally a costing decision, because if the table is
> small enough or the query isn't very selective, #2 will actually be
> faster, just on the raw power of more workers and less random I/O
> ("Consider a parallel-aware append using a mix of partial and
> non-partial paths.")

Thanks for those examples.

I ran this situation through the code but used a hash index instead of
GIST. The 3 settings which give us control over this plan are
enable_parallel_append, enable_indexscan, enable_bitmapscan.
enable_bitmapscan must be included since we can still get a parallel
bitmap scan with a hash index.

For completeness, I just tried with each of the 8 combinations of the
GUCs, but I'd detailed below which of your cases I'm testing as a
comment. There are 4 cases since #2 works with parallel and
non-parallel Append. Naturally, the aim is that the patched version
does not change the behaviour.

-- Test case
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
create table listp2 partition of listp for values in(2);
insert into listp select x,y from generate_Series(1,2) x,
generate_Series(1,1000000) y;
create index on listp2 using hash(b);
vacuum analyze listp;

explain (costs off) select * from listp where b = 1;
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1;
SET enable_indexscan = on;
SET enable_bitmapscan = off;
explain (costs off) select * from listp where b = 1; -- case #3, Mixed
scan of parallel and non-parallel paths with a Parallel Append
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1; -- case #2 with
Parallel Append
SET enable_indexscan = on;
SET enable_bitmapscan = on;
SET enable_parallel_append = off;
explain (costs off) select * from listp where b = 1;
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1; -- case #2 with
non-Parallel Append
SET enable_indexscan = on;
SET enable_bitmapscan = off;
explain (costs off) select * from listp where b = 1; -- case #1, best
serial plan
SET enable_indexscan = off;
explain (costs off) select * from listp where b = 1;

The results, patched/unpatched, are the same.

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-04-23 04:55:19 [PATCH] Fix install-tests target for vpath builds
Previous Message Ranier Vilela 2020-04-23 04:21:21 Re: [PATCH] Fix buffer not null terminated on (ecpg lib)