Top-N sorts verses parallelism

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Top-N sorts verses parallelism
Date: 2017-12-13 06:46:19
Message-ID: CAEepm=1BWtC34vUroA0Uqjw02MaqdUrW+d6WD85_k8SLyPiKHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

The start-up cost of bounded (top-N) sorts is sensitive at the small
end of N, and the
comparison_cost * tuples * LOG2(2.0 * output_tuples) curve doesn't
seem to correspond to reality. Here's a contrived example that comes
from a benchmark:

create table foo as
select generate_series(1, 1000000)::int a, (random() * 1000)::int b;
analyze foo;
select * from foo order by b limit N;

Estimated start-up costs for different N:

limit 1 = 19425.00
limit 2 = 24425.00
limit 3 = 27349.81
limit 4 = 29425.00
limit 10 = 36034.64
limit 50 = 47644.28
limit 100 = 52644.28
limit 133 = 54701.41

But the actual time doesn't really change on this random development
system: it's around 300ms. I stopped at limit 133 because for this
particular query, at 134 a Gather Merge plan kicked in and I got
degree 3 parallel sorting and I got my answer just shy of three times
faster. The speed up definitely paid for the parallel_setup_cost and
only one tuple was sent from each worker to the leader because the
limit was pushed down.

I want to get my answer three times as fast when I say limit 1 too!
But I can't because we seem to think that limit 1 is dramatically
cheaper than limit 134, so parallelism isn't worth bothering with.

With wider tuples the same effect can be seen, though it gets much
more difficult to convince Gather Merge to be selected at all for
reasons I haven't looked into yet. Is Gather Merge for top-N sorts a
missed opportunity due to underestimation of top-N for small values of
N? Or is there some other way to think about this problem? Thoughts?

--
Thomas Munro
http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ali Akbar 2017-12-13 06:59:56 Re: [HACKERS] pg_upgrade failed with error - ERROR: column "a" in child table must be marked NOT NULL
Previous Message Amit Kapila 2017-12-13 06:41:25 Re: [HACKERS] parallel.c oblivion of worker-startup failures