Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date: 2012-04-24 21:01:29
Message-ID: CA+Tgmoa8_u=HQjpWvfrBdqDRCidyZsF05RC24ogtvPxWHuhCWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 24, 2012 at 12:51 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 24 April 2012 16:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If they are in sorted order with an empty string
>> appended onto the end, it takes about 25 seconds.
>
> That's exactly what I'd have expected, but was surprised to have not
> found with my own test. Perhaps it was same kind of fluke (i.e. a
> re-creatable one - I'm quite confident that my benchmark was not
> methodologically flawed, at least in execution).

Oh, I read your results as showing something quite similar.

>> Based on that, I'm inclined to propose rejiggering things so that the
>> presorted-input check runs only at the top level, and not during any
>> recursive steps.  The theory is that it won't cost much because if the
>> data is randomly ordered we won't do many comparisons before falling
>> out, and that seems to be true.  But the only point of doing it in the
>> recursive steps is to win when the data is partially ordered, and we
>> actually seem to be losing big-time in that case, perhaps because when
>> the data is partially ordered, the presort check will frequently to
>> run through a significant part of the array - wasting cycles - but
>> fall out before it actually gets to the end.
>
> That makes sense to me, but obviously more data is needed here.

What more data do you think is needed? I've been suspicious of that
code since the first time I looked at it, and I'm now fairly well
convinced that it's full of suckitude. Honestly, I'm not sure I could
manage to contrive a case where that code wins if I set out to do so.

>> Of course, even if we did that, it might not be as good as your
>> timsort numbers, but that doesn't mean we shouldn't do it...
>
> The frustrating thing about my timsort numbers, as I mentioned in
> reply to Dimitri (He modified the subject a bit, so that might appear
> to be a different thread to you), is that they appear to be almost
> consistently winning when you consider the number of comparisons, but
> in fact lose when you measure the duration of execution or TPS or
> whatever. I expected a certain amount of this, but not enough to
> entirely derail the case for replacing quicksort with timsort when
> sorting a single key of text, which is obviously the really compelling
> case for optimisation here. This situation is only going to be made
> "worse" by the work you've done on SortSupport for text, ...

That is quite baffling. Have you profiled it at all?

> ...which,
> incidentally, I agree is worthwhile.

Cool, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2012-04-24 23:16:01 Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Previous Message Robert Haas 2012-04-24 20:55:18 Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)