Re: speeding up planning with partitions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Imai Yoshikazu <yoshikazu_i443(at)live(dot)jp>, "jesper(dot)pedersen(at)redhat(dot)com" <jesper(dot)pedersen(at)redhat(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: speeding up planning with partitions
Date: 2019-03-30 00:17:32
Message-ID: 25035.1553905052@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> writes:
> On 2019/03/29 7:38, Tom Lane wrote:
>> 2. I seriously dislike what's been done in joinrels.c, too. That
>> really seems like a kluge (and I haven't had time to study it
>> closely).

> Those hunks account for the fact that pruned partitions, for which we no
> longer create RangeTblEntry and RelOptInfo, may appear on the nullable
> side of an outer join. We'll need a RelOptInfo holding a dummy path, so
> that outer join paths can be created with one side of join being dummy
> result path, which are built in the patch by build_dummy_partition_rel().

Now that I've had a chance to look closer, there's no way I'm committing
that change in joinrels.c. If it works at all, it's accidental, because
it's breaking all sorts of data structure invariants. The business with
an AppendRelInfo that maps from the parentrel to itself is particularly
ugly; and I doubt that you can get away with assuming that
root->append_rel_array[parent->relid] is available for use for that.
(What if the parent is an intermediate partitioned table?)

There's also the small problem of the GEQO crash. It's possible that
that could be gotten around by switching into the long-term planner
context in update_child_rel_info and build_dummy_partition_rel, but
then you're creating a memory leak across GEQO cycles. It'd be much
better to avoid touching base-relation data structures during join
planning.

What I propose we do about the GEQO problem is shown in 0001 attached
(which would need to be back-patched into v11). This is based on the
observation that, if we know an input relation is empty, we can often
prove the join is empty and then skip building it at all. (In the
existing partitionwise-join code, the same cases are detected by
populate_joinrel_with_paths, but we do a fair amount of work before
discovering that.) The cases where that's not true are where we
have a pruned partition on the inside of a left join, or either side
of a full join ... but frankly, what the existing code produces for
those cases is not short of embarrassing:

-> Hash Left Join
Hash Cond: (pagg_tab1_p1.x = y)
Filter: ((pagg_tab1_p1.x > 5) OR (y < 20))
-> Seq Scan on pagg_tab1_p1
Filter: (x < 20)
-> Hash
-> Result
One-Time Filter: false

That's just dumb. What we *ought* to be doing in such degenerate
outer-join cases is just emitting the non-dummy side, ie

-> Seq Scan on pagg_tab1_p1
Filter: (x < 20) AND ((pagg_tab1_p1.x > 5) OR (y < 20))

in this example. I would envision handling this by teaching the
code to generate a path for the joinrel that's basically just a
ProjectionPath atop a path for the non-dummy input rel, with the
projection set up to emit nulls for the columns of the dummy side.
(Note that this would be useful for outer joins against dummy rels
in regular planning contexts, not only partitionwise joins.)

Pending somebody doing the work for that, though, I do not
have a problem with just being unable to generate partitionwise
joins in such cases, so 0001 attached just changes the expected
outputs for the affected regression test cases.

0002 attached is then the rest of the partition-planning patch;
it doesn't need to mess with joinrels.c at all. I've addressed
the other points discussed today in that, except for the business
about whether we want your 0003 bitmap-of-live-partitions patch.
I'm still inclined to think that that's not really worth it,
especially in view of your performance results.

If people are OK with this approach to solving the GEQO problem,
I think these are committable.

regards, tom lane

Attachment Content-Type Size
0001-avoid-geqo-crash-with-partitionwise-join.patch text/x-diff 23.7 KB
0002-speed-up-planning-with-partitions-39.patch text/x-diff 100.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-30 00:40:01 Re: Unix socket dir, an idea
Previous Message Jerry Jelinek 2019-03-29 22:18:42 Re: patch to allow disable of WAL recycling