Re: speeding up planning with partitions

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp>, "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2019-03-30 02:19:58
Message-ID: CA+HiwqGs04CN=dh-bi2F2FzB-_1yCpeUSbB3Znyh6PWxhwdZcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the new patches.

On Sat, Mar 30, 2019 at 9:17 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > On 2019/03/29 7:38, Tom Lane wrote:
> >> 2. I seriously dislike what's been done in joinrels.c, too. That
> >> really seems like a kluge (and I haven't had time to study it
> >> closely).
>
> > Those hunks account for the fact that pruned partitions, for which we no
> > longer create RangeTblEntry and RelOptInfo, may appear on the nullable
> > side of an outer join. We'll need a RelOptInfo holding a dummy path, so
> > that outer join paths can be created with one side of join being dummy
> > result path, which are built in the patch by build_dummy_partition_rel().
>
> Now that I've had a chance to look closer, there's no way I'm committing
> that change in joinrels.c. If it works at all, it's accidental, because
> it's breaking all sorts of data structure invariants. The business with
> an AppendRelInfo that maps from the parentrel to itself is particularly
> ugly; and I doubt that you can get away with assuming that
> root->append_rel_array[parent->relid] is available for use for that.
> (What if the parent is an intermediate partitioned table?)
>
> There's also the small problem of the GEQO crash. It's possible that
> that could be gotten around by switching into the long-term planner
> context in update_child_rel_info and build_dummy_partition_rel, but
> then you're creating a memory leak across GEQO cycles. It'd be much
> better to avoid touching base-relation data structures during join
> planning.
>
> What I propose we do about the GEQO problem is shown in 0001 attached
> (which would need to be back-patched into v11). This is based on the
> observation that, if we know an input relation is empty, we can often
> prove the join is empty and then skip building it at all. (In the
> existing partitionwise-join code, the same cases are detected by
> populate_joinrel_with_paths, but we do a fair amount of work before
> discovering that.) The cases where that's not true are where we
> have a pruned partition on the inside of a left join, or either side
> of a full join ... but frankly, what the existing code produces for
> those cases is not short of embarrassing:
>
> -> Hash Left Join
> Hash Cond: (pagg_tab1_p1.x = y)
> Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
> -> Seq Scan on pagg_tab1_p1
> Filter: (x < 20)
> -> Hash
> -> Result
> One-Time Filter: false
>
> That's just dumb. What we *ought* to be doing in such degenerate
> outer-join cases is just emitting the non-dummy side, ie
>
> -> Seq Scan on pagg_tab1_p1
> Filter: (x < 20) AND ((pagg_tab1_p1.x > 5) OR (y < 20))
>
> in this example. I would envision handling this by teaching the
> code to generate a path for the joinrel that's basically just a
> ProjectionPath atop a path for the non-dummy input rel, with the
> projection set up to emit nulls for the columns of the dummy side.
> (Note that this would be useful for outer joins against dummy rels
> in regular planning contexts, not only partitionwise joins.)
>
> Pending somebody doing the work for that, though, I do not
> have a problem with just being unable to generate partitionwise
> joins in such cases, so 0001 attached just changes the expected
> outputs for the affected regression test cases.

Fwiw, I agree that we should fix join planning so that we get the
ProjectionPath atop scan path of non-nullable relation instead of a
full-fledged join path with dummy path on the nullable side. It seems
to me that the "fix" would be mostly be localized to
try_partitionwise_join() at least insofar as detecting whether we
should generate a join or the other plan shape is concerned, right?

By the way, does it make sense to remove the tests whose output
changes altogether and reintroduce them when we fix join planning?
Especially, in partitionwise_aggregate.out, there are comments near
changed plans which are no longer true.

> 0002 attached is then the rest of the partition-planning patch;
> it doesn't need to mess with joinrels.c at all. I've addressed
> the other points discussed today in that, except for the business
> about whether we want your 0003 bitmap-of-live-partitions patch.
> I'm still inclined to think that that's not really worth it,
> especially in view of your performance results.

I think the performance results did prove that degradation due to
those loops over part_rels becomes significant for very large
partition counts. Is there a better solution than the bitmapset that
you have in mind?

> If people are OK with this approach to solving the GEQO problem,
> I think these are committable.

Thanks again. Really appreciate that you are putting so much of your
time into this.

Regards,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hironobu SUZUKI 2019-03-30 04:33:17 Re: pgbench - add pseudo-random permutation function
Previous Message Michael Paquier 2019-03-30 01:51:23 Re: clean up pg_checksums.sgml