Re: Ordered Partitioned Table Scans

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Ordered Partitioned Table Scans
Date: 2019-03-22 15:12:07
Message-ID: 1778.1553267527@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> On Sat, 9 Mar 2019 at 10:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> This can be a huge win for queries of the form "ORDER BY partkey LIMIT
>>> x". Even if the first subpath(s) aren't natively ordered, not all of
>>> the sorts should actually be performed.

>> [ shrug... ] We've got no realistic chance of estimating such situations
>> properly, so I'd have no confidence in a plan choice based on such a
>> thing. Nor do I believe that this case is all that important.

> Wondering if you can provide more details on why you think there's no
> realistic chance of the planner costing these cases correctly?

The reason why I'm skeptical about LIMIT with a plan of the form
append-some-sorts-together is that there are going to be large
discontinuities in the cost-vs-number-of-rows-returned graph,
ie, every time you finish one child plan and start the next one,
there'll be a hiccup while the sort happens. This is something
that our plan cost approximation (startup cost followed by linear
output up to total cost) can't even represent. Sticking a
LIMIT on top of that is certainly not going to lead to any useful
estimate of the actual cost, meaning that if you get a correct
plan choice it'll just be by luck, and if you don't there'll be
nothing to be done about it.

If we don't incorporate that, then the situation that the planner
will have to model is a MergeAppend with possibly some sorts in
front of it, and it will correctly cost that as if all the sorts
happen before any output occurs, and so you can hope that reasonable
plan choices will ensue.

I believe that the cases where a LIMIT query actually ought to go
for a fast-start plan are where *all* the child rels have fast-start
(non-sort) paths --- which is exactly the cases we'd get if we only
allow "sorted" Appends when none of the inputs require a sort.
Imagining that we'll get good results in cases where some of them
need to be sorted is just wishful thinking.

> It would be unfortunate to reject this patch based on a sentence that
> starts with "[ shrug... ]", especially so when three people have stood
> up and disagreed with you.

I don't want to reject the patch altogether, I just want it to not
include a special hack to allow Append rather than MergeAppend in such
cases. I am aware that I'm probably going to be shouted down on this
point, but that will not change my opinion that the shouters are wrong.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-03-22 15:13:09 Re: Removing unneeded self joins
Previous Message Alexander Kuzmenkov 2019-03-22 14:39:40 Re: Removing unneeded self joins