Re: Ordered Partitioned Table Scans

From: Antonin Houska <ah(at)cybertec(dot)at>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2018-10-31 15:01:45
Message-ID: 29278.1540998105@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:

> On 31 October 2018 at 13:05, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> >>> On 28 October 2018 at 03:49, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >> I've registered as a reviewer. I still didn't have a deep look at
> >> the patch yet, but thanks a lot for working on it!
> >
> > Thanks for signing up to review. I need to send another revision of
> > the patch to add a missing call to truncate_useless_pathkeys(). Will
> > try to do that today.
>
> I've attached a patch that ...

I've picked this one when looking around what I could review.

* As for the logic, I found generate_mergeappend_paths() to be the most
interesting part:

Imagine table partitioned by "i", so "partition_pathkeys" is {i}.

partition 1:

i | j
--+--
0 | 0
1 | 1
0 | 1
1 | 0

partition 2:

i | j
--+--
3 | 0
2 | 0
2 | 1
3 | 1

Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
ordering of the subpaths should not change the way tuples are split into
partitions.

Obviously a problem is if "partition_pathkeys" and "pathkeys" lists start with
different items. To propose more generic rule, I used this example of
range-partitioned table, where "i" and "j" are the partitioning keys:

partition 1:

i | j | k
---+---+---
0 | 0 | 1
0 | 0 | 0

partition 2:

i | j | k
---+---+---
0 | 1 | 0
0 | 1 | 1

If the output "pathkey" is {i, k}, then the Append path makes rows of both
partitions interleave:

i | j | k
---+---+---
0 | 0 | 0
0 | 1 | 0
0 | 0 | 1
0 | 1 | 1

So in general I think the restriction is that no valid position of "pathkeys"
and "partition_pathkeys" may differ. Or in other words: the shorter of the 2
pathkey lists must be contained in the longer one. Does it make sense to you?

Another problem I see is that build_partition_pathkeys() continues even if it
fails to create a pathkey for some partitioning column. In the example above
it would mean that the table can have "partition_pathkeys" equal to {j}
instead of {i, j} on some EC-related conditions. However such a key does not
correspond to reality - this is easier to imagine if another partition is
considered.

partition 3:

i | j | k
---+---+---
1 | 0 | 1
1 | 0 | 0

So I think no "partition_pathkeys" should be generated in that case. On the
other hand, if the function returned the part of the list it could construct
so far, it'd be wrong because such incomplete pathkeys could pass the checks I
proposed above for reasons unrelated to the partitioning scheme.

The following comments are mostly on coding:

* Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
this sentence in the header comment:

Note: If changing this, see build_partition_pathkeys()

However I could not find other use besides that in
RelationBuildPartitionDesc().

* create_append_path():

/*
* Apply query-wide LIMIT if known and path is for sole base relation.
* (Handling this at this low level is a bit klugy.)
*/
if (root != NULL && pathkeys != NULL &&
bms_equal(rel->relids, root->all_baserels))
pathnode->limit_tuples = root->limit_tuples;
else
pathnode->limit_tuples = -1.0;

I think this optimization is not specific to AppendPath / MergeAppendPath,
so it could be moved elsewhere (as a separate patch of course). But
specifically for AppendPath, why do we have to test pathkeys? The pathkeys
of the AppendPath do not necessarily describe the order of the set to which
LIMIT is applied, so their existence should not be important here.

* If pathkeys is passed, shouldn't create_append_path() include the
cost_sort() of subpaths just like create_merge_append_path() does? And if
so, then create_append_path() and create_merge_append_path() might
eventually have some common code (at least for the subpath processing) to be
put into a separate function.

* Likewise, create_append_plan() / create_merge_append_plan() are going to be
more similar then before, so some refactoring could also make sense.

Although it's not too much code, I admit the patch is not trivial, so I'm
curious about your opinion.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2018-10-31 15:13:13 Re: WIP Patch: Add a function that returns binary JSONB as a bytea
Previous Message Andres Freund 2018-10-31 14:51:06 Re: Typo in xlog.c comment?