Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Top-N sorts in EXPLAIN, row count estimates, and parallelism
Date: 2019-05-24 02:47:56
Message-ID: CAKJS1f_vmLFu-C1uq4+Pbp2VM3g4a4xdujPOtDt9QG6YJOzNYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 24 May 2019 at 10:44, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Thu, May 23, 2019 at 3:31 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Given the way that's implemented, I doubt that we can report it
> > reliably in EXPLAIN.
>
> Does it have to be totally reliable?
>
> cost_sort() costs sorts as top-N heapsorts. While we cannot make an
> iron-clad guarantee that it will work out that way from within
> tuplesort.c, that doesn't seem like it closes off the possibility of
> more informative EXPLAIN output. For example, can't we at report that
> the tuplesort will be "bounded" within EXPLAIN, indicating that we
> intend to attempt to sort using a top-N heap sort (i.e. we'll
> definitely do it that way if there is sufficient work_mem)?

I think this really needs more of a concrete proposal. Remember
LIMIT/OFFSET don't need to be constants, they could be a Param or some
return value from a subquery, so the bound might not be known until
after executor startup, to which EXPLAIN is not going to get to know
about that.

Perhaps something to be tagged onto the Sort path in grouping_planner
if preprocess_limit() managed to come up with a value. double does
not seem like the perfect choice for a bound to show in EXPLAIN and
int64 could wrap for very high LIMIT + OFFSET values. Showing an
approximate value in EXPLAIN seems like it might be a source of future
bug reports. Perhaps if we did it, we could just set it to -1
(unknown) if LIMIT + OFFSET happened to wrap an int64. Implementing
that would seem to require adding a new field for that in SortPath,
Sort and SortState, plus all the additional code for passing the value
over. We'd have to then hope nobody used the field for anything
important in the future.

After that, what would we do with it in EXPLAIN? Always show "Bound:
<n>", if it's not -1?

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2019-05-24 02:56:24 Re: Remove page-read callback from XLogReaderState.
Previous Message Sascha Kuhl 2019-05-24 02:31:20 Indexing - comparison of tree structures