Quick Links

Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc:	PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg S <stark(at)mit(dot)edu>
Subject:	Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date:	2012-04-24 21:01:29
Message-ID:	CA+Tgmoa8_u=HQjpWvfrBdqDRCidyZsF05RC24ogtvPxWHuhCWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 24, 2012 at 12:51 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 24 April 2012 16:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> If they are in sorted order with an empty string
>> appended onto the end, it takes about 25 seconds.
>
> That's exactly what I'd have expected, but was surprised to have not
> found with my own test. Perhaps it was same kind of fluke (i.e. a
> re-creatable one - I'm quite confident that my benchmark was not
> methodologically flawed, at least in execution).

Oh, I read your results as showing something quite similar.

>> Based on that, I'm inclined to propose rejiggering things so that the
>> presorted-input check runs only at the top level, and not during any
>> recursive steps. The theory is that it won't cost much because if the
>> data is randomly ordered we won't do many comparisons before falling
>> out, and that seems to be true. But the only point of doing it in the
>> recursive steps is to win when the data is partially ordered, and we
>> actually seem to be losing big-time in that case, perhaps because when
>> the data is partially ordered, the presort check will frequently to
>> run through a significant part of the array - wasting cycles - but
>> fall out before it actually gets to the end.
>
> That makes sense to me, but obviously more data is needed here.

What more data do you think is needed? I've been suspicious of that
code since the first time I looked at it, and I'm now fairly well
convinced that it's full of suckitude. Honestly, I'm not sure I could
manage to contrive a case where that code wins if I set out to do so.

>> Of course, even if we did that, it might not be as good as your
>> timsort numbers, but that doesn't mean we shouldn't do it...
>
> The frustrating thing about my timsort numbers, as I mentioned in
> reply to Dimitri (He modified the subject a bit, so that might appear
> to be a different thread to you), is that they appear to be almost
> consistently winning when you consider the number of comparisons, but
> in fact lose when you measure the duration of execution or TPS or
> whatever. I expected a certain amount of this, but not enough to
> entirely derail the case for replacing quicksort with timsort when
> sorting a single key of text, which is obviously the really compelling
> case for optimisation here. This situation is only going to be made
> "worse" by the work you've done on SortSupport for text, ...

That is quite baffling. Have you profiled it at all?

> ...which,
> incidentally, I agree is worthwhile.

Cool, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) at 2012-04-24 16:51:00 from Peter Geoghegan

Responses

Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) at 2012-04-24 23:16:01 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Geoghegan	2012-04-24 23:16:01	Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Previous Message	Robert Haas	2012-04-24 20:55:18	Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)