Re: Using quicksort for every external sort run

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Using quicksort for every external sort run
Date: 2015-12-23 21:16:09
Message-ID: CA+TgmoYPKjZcOPdwyjTik0iCBDO5Bd7rNKn3tNzYxVfEgyKvGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 23, 2015 at 3:31 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Wed, Dec 23, 2015 at 9:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The point is, nobody can tell WHAT effects this is modeling.
>> Increasing the tuple size makes the crossover go up. Or down.
>
> There are multiple, competing considerations.

Please explain what they are and how they lead you to believe that the
cost factors you have chosen are good ones.

My point here is: even if I were to concede that your cost model
yields perfect answers in every case, the patch needs to give at least
some hint as to why. Right now, it really doesn't.

>>> Another factor is that the heap could be useful for other stuff in the
>>> future. As Simon Riggs pointed out, for deduplicating values as
>>> they're read in by tuplesort. (Okay, that's really the only other
>>> thing, but it's a good one).
>>
>> Not sure how that would work?
>
> Tuplesort would have license to discard tuples with matching existing
> values, because the caller gave it permission to. This is something
> that you can easily imagine occurring with ordered set aggregates, for
> example. It would work in a way not unlike a top-N heapsort does
> today. This would work well when it can substantially lower the use of
> memory (initially heapification when the threshold is crossed would
> probably measure the number of duplicates, and proceed only when it
> looked like a promising strategy).

It's not clear to me how having a heap helps with that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-23 23:15:38 Re: parallel joins, and better parallel explain
Previous Message Peter Geoghegan 2015-12-23 21:08:29 Re: Using quicksort for every external sort run