Quick Links

Re: Ordered Partitioned Table Scans

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Julien Rouhaud <rjuju123(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Ordered Partitioned Table Scans
Date:	2018-11-01 09:05:38
Message-ID:	4591.1541063138@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> wrote:

> On 1 November 2018 at 04:01, Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > * As for the logic, I found generate_mergeappend_paths() to be the most
> > interesting part:
> >
> > Imagine table partitioned by "i", so "partition_pathkeys" is {i}.
> >
> > partition 1:
> >
> > i | j
> > --+--
> > 0 | 0
> > 1 | 1
> > 0 | 1
> > 1 | 0
> >
> > partition 2:
> >
> > i | j
> > --+--
> > 3 | 0
> > 2 | 0
> > 2 | 1
> > 3 | 1
> >
> > Even if "pathkeys" is {i, j}, i.e. not contained in "partition_pathkeys", the
> > ordering of the subpaths should not change the way tuples are split into
> > partitions.
> >
> > ...
>
> I understand what you're saying. I just don't understand what you
> think is wrong with the patch in this area.

I think these conditions are too restrictive:

/*
* Determine if these pathkeys match the partition order, or reverse
* partition order. It can't match both, so only go to the trouble of
* checking the reverse order when it's not in ascending partition
* order.
*/
partition_order = pathkeys_contained_in(pathkeys,
partition_pathkeys);
partition_order_desc = !partition_order &&
pathkeys_contained_in(pathkeys,
partition_pathkeys_desc);

In the example above I wanted to show that your new feature should work even
if "pathkeys" is not contained in "partition_pathkeys".

> > Another problem I see is that build_partition_pathkeys() continues even if it
> > fails to create a pathkey for some partitioning column. In the example above
> > it would mean that the table can have "partition_pathkeys" equal to {j}
> > instead of {i, j} on some EC-related conditions. However such a key does not
> > correspond to reality - this is easier to imagine if another partition is
> > considered.
> >
> > partition 3:
> >
> > i | j | k
> > ---+---+---
> > 1 | 0 | 1
> > 1 | 0 | 0
> >
> > So I think no "partition_pathkeys" should be generated in that case. On the
> > other hand, if the function returned the part of the list it could construct
> > so far, it'd be wrong because such incomplete pathkeys could pass the checks I
> > proposed above for reasons unrelated to the partitioning scheme.
>
> Oops. That's a mistake. We should return what we have so far if we
> can't make one of the pathkeys. Will fix.

I think no "partition_pathkeys" should be created in this case, but before we
can discuss this in detail there needs to be an agreement on the evaluation of
the relationship between "pathkeys" and "partition_pathkeys", see above.

> > The following comments are mostly on coding:
> >
> > * Both qsort_partition_list_value_cmp() and qsort_partition_rbound_cmp() have
> > this sentence in the header comment:
> >
> > Note: If changing this, see build_partition_pathkeys()
> >
> > However I could not find other use besides that in
> > RelationBuildPartitionDesc().
>
> While the new code does not call those directly, the new code does
> depend on the sort order of the partitions inside the PartitionDesc,
> which those functions are responsible for. Perhaps there's a better
> way to communicate that.

I pointed this out because I suspect that changes of these functions would
affect more features, not only the one you're trying to implement.

> > * If pathkeys is passed, shouldn't create_append_path() include the
> > cost_sort() of subpaths just like create_merge_append_path() does? And if
> > so, then create_append_path() and create_merge_append_path() might
> > eventually have some common code (at least for the subpath processing) to be
> > put into a separate function.
>
> It does. It's just done via the call to cost_append().

ok, I missed that.

> > * Likewise, create_append_plan() / create_merge_append_plan() are going to be
> > more similar then before, so some refactoring could also make sense.
> >
> > Although it's not too much code, I admit the patch is not trivial, so I'm
> > curious about your opinion.
>
> I think the costing code is sufficiently different to warant not
> sharing more. For example, the startup costing is completely
> different. Append can start on the startup cost of the first subpath,
> but MergeAppend takes the sum of the startup cost of all subpaths.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

In response to

Re: Ordered Partitioned Table Scans at 2018-11-01 03:14:11 from David Rowley

Responses

Re: Ordered Partitioned Table Scans at 2018-11-04 21:46:57 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hubert Zhang	2018-11-01 09:32:09	Re: Is there way to detect uncommitted 'new table' in pg_class?
Previous Message	Amit Langote	2018-11-01 08:56:55	Re: Hooks to Modify Execution Flow and Query Planner