Re: Top-N sorts verses parallelism

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Top-N sorts verses parallelism
Date: 2017-12-14 21:05:33
Message-ID: CAMkU=1x8ENPmAdXGVCD+KkxO8s927vJYhCp7TxSkUkiYLgJYbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 12, 2017 at 10:46 PM, Thomas Munro <
thomas(dot)munro(at)enterprisedb(dot)com> wrote:

> Hi hackers,
>
> The start-up cost of bounded (top-N) sorts is sensitive at the small
> end of N, and the
> comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
> seem to correspond to reality. Here's a contrived example that comes
> from a benchmark:
>
> create table foo as
> select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
> analyze foo;
> select * from foo order by b limit N;
>

This looks like a costing bug. The estimated cost of sorting 416,667
estimated tuples in one parallel worker is almost identical to the
estimated cost of sorting 1,000,000 tuples when parallelism is turned off.
Adding some logging to the cost_sort function, it looks like the limit does
not get sent down for the parallel estimate:

NOTICE: JJ cost_sort tuples 1000000.000000, limit 61.000000, sort_mem 65536
NOTICE: JJ cost_sort tuples 416667.000000, limit -1.000000, sort_mem 65536

So the LIMIT gets passed down for the execution step, but not for the
planning step.

(On my system, LIMIT 61 is the point at which it switches to parallel)

In any case, if we "fixed" the top-n estimate to use the random-case rather
than the worst-case, that would make the LIMIT 133 look more like the LIMIT
1, so would be the opposite direction of what you want.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-14 21:15:53 Re: [HACKERS] Surjective functional indexes
Previous Message Andres Freund 2017-12-14 20:38:03 Re: pgsql: Provide overflow safe integer math inline functions.