Re: Ordered Partitioned Table Scans

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Antonin Houska <ah(at)cybertec(dot)at>, Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2019-03-09 06:51:57
Message-ID: CAKJS1f_0cKpoPjdcNt5p_rj0QTtJEzYTdG8cLiP9WUYktTZhBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 9 Mar 2019 at 09:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I took a quick look through this and I'm not very happy with it.
> It seems to me that the premise ought to be "just use an Append
> if we can prove the output would be ordered anyway", but that's not
> what we actually have here: instead you're adding more infrastructure
> onto Append, which notably involves invasive changes to the API of
> create_append_path, which is the main reason why the patch keeps breaking.

Can you suggest how else we could teach higher paths that an Append is
ordered by some path keys without giving the append some pathkeys?
That's what pathkeys are for, so I struggle to imagine how else this
could work. If we don't do this, then how is a MergeJoin going to
know it does not need to sort before joining?

As for the "the patch keeps breaking"... those are just conflicts
with other changes that have been made in master. That seems fairly
normal to me.

> (It's broken again as of HEAD, though the cfbot doesn't seem to have
> noticed yet.)

I think it's not been updating itself for a few days.

> Likewise there's a bunch of added complication in
> cost_append, create_append_plan, etc. I think you should remove all that
> and restrict this optimization to the case where all the subpaths are
> natively ordered --- if we have to insert Sorts, it's hardly going to move
> the needle to worry about simplifying the parent MergeAppend to Append.

I think the patch would be less than half as useful if we do that.
Can you explain why you think that fast startup plans are less
important for partitioned tables?

I could perhaps understand an argument against this if the patch added
masses of complex code to achieve the goal, but in my opinion, the
code is fairly easy to understand and there's not very much extra code
added.

> There also seem to be bits that duplicate functionality of the
> drop-single-child-[Merge]Append patch (specifically I'm looking
> at get_singleton_append_subpath). Why do we need that?

hmm, that patch is separate functionality. The patch you're talking
about, as you know, just removes Append/MergeAppends that have a
single subpath. Over here we add smarts to allow conversion of
MergeAppends into Appends when the order of the partitions is defined
the same as the required order of the, would be, MergeAppend path.

get_singleton_append_subpath() allows sub-partitioned table's
MergeAppend or Append subpaths to be pulled into the top-level Appends
when they just contain a single subpath.

An example from the tests:

Append
-> Index Scan using mcrparted0_a_abs_c_idx on mcrparted0
-> Index Scan using mcrparted1_a_abs_c_idx on mcrparted1
-> Index Scan using mcrparted2_a_abs_c_idx on mcrparted2
-> Index Scan using mcrparted3_a_abs_c_idx on mcrparted3
-> Index Scan using mcrparted4_a_abs_c_idx on mcrparted4
-> Merge Append
Sort Key: mcrparted5a.a, (abs(mcrparted5a.b)), mcrparted5a.c
-> Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a
-> Index Scan using mcrparted5_def_a_abs_c_idx on mcrparted5_def

If the nested MergeAppend path just had a single node then
get_singleton_append_subpath() would have pulled the subpath into the
top-level Append. However, in this example, since there are multiple
MergeAppend subpath, the pull-up would be invalid since the top-level
Append can't guarantee the sort order of those MergeAppend subpaths.
In fact, the test directly after that one drops the mcrparted5_def
table which turns the plan into:

Append
-> Index Scan using mcrparted0_a_abs_c_idx on mcrparted0
-> Index Scan using mcrparted1_a_abs_c_idx on mcrparted1
-> Index Scan using mcrparted2_a_abs_c_idx on mcrparted2
-> Index Scan using mcrparted3_a_abs_c_idx on mcrparted3
-> Index Scan using mcrparted4_a_abs_c_idx on mcrparted4
-> Index Scan using mcrparted5a_a_abs_c_idx on mcrparted5a

> The logic in build_partition_pathkeys is noticeably stupider than
> build_index_pathkeys, in particular it's not bright about boolean columns.
> Maybe that's fine, but if so it deserves a comment explaining why we're
> not bothering.

Good point. That's required to allow cases like:

SELECT * FROM parttable WHERE boolcol = true ORDER BY boolcol, ordercol;

I've fixed that in the attached.

> Also, the comment for build_index_pathkeys specifies that
> callers should do truncate_useless_pathkeys, which they do; why is that
> not relevant here?

I've neglected to explain that in the comments. The problem with that
is that doing so would break cases where we use an Append when the
partition keys are a prefix of the query's pathkeys. Say we had a
range partition table on (a,b) and an index on (a, b, c):

SELECT * FROM range_ab ORDER BY a, b, c;

With the current patch, we can use an Append for that as no earlier
value of (a,b) can come in a later partition. If we
truncate_useless_pathkeys() then it'll strip out all pathkeys as the
partition pathkeys are not contained within the query pathkeys.

Maybe the patch should perform truncate_useless_pathkeys for the check
where we see if the query pathkeys are contained in the partition's
pathkeys. However, we can't do it for the reverse check.

I've added a comment to explain about the lack of
truncate_useless_pathkeys() in the attached.

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
v10-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-.patch application/octet-stream 63.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-09 06:53:17 Re: Is it too soon for a PG12 open items wiki page?
Previous Message Pavel Stehule 2019-03-09 06:22:05 proposal: type info support functions for functions that use "any" type