cost_sort() may need to be updated

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: cost_sort() may need to be updated
Date: 2016-09-10 02:15:46
Message-ID: CAM3SWZQLP6e=1si1NcQjYft7R-VYpprrf_i59tZOZX5m7VFK-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

For external sorts, cost_sort() assumes the following:

* The disk traffic is assumed to be 3/4ths sequential and 1/4th random
* accesses (XXX can't we refine that guess?)

...
/* Assume 3/4ths of accesses are sequential, 1/4th are not */
startup_cost += npageaccesses *
(seq_page_cost * 0.75 + random_page_cost * 0.25);

I think that we *can* refine this guess, and should, because random
I/O is really quite unlikely to be a large cost these days (I/O in
general often isn't a large cost, actually). More fundamentally, I
think it's a problem that cost_sort() thinks that external sorts are
far more expensive than internal sorts in general. There is good
reason to think that that does not reflect the reality. I think we can
expect external sorts to be *faster* than internal sorts with
increasing regularity in Postgres 10.

In one sense, things got worse here in 9.6 when replacement selection
was all but killed, since runs became on average half their size,
which cost_sort() was taught about at the time. But, cost_sort()
didn't care about cases where only one pass is predicted (the great
majority of external sorts), so that didn't really matter. cost_sort()
doesn't seem to care about the size of runs to be merged at all, which
seems limiting. It also seems limiting that cost_sort() doesn't
differentiate between internal and external sort *startup* costs at
all. External sorts can often start returning tuples sooner, due to
the final on-the-fly merge mechanism.

It's pretty common to see cases where merging 4 runs takes only
slightly less time than an internal sort, and yet are thought to cost
more than twice as much. I realize that costs are always a bit fudged,
but I am fairly suspicious of how big the differences between external
and internal sorts regularly are. I suggest that this be revisited
soon.

--
Peter Geoghegan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Burovoy 2016-09-10 03:45:35 Re: identity columns
Previous Message Thomas Munro 2016-09-10 01:29:00 Re: kqueue