Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Greg S <stark(at)mit(dot)edu>
Subject: Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Date: 2012-04-24 16:51:00
Message-ID: CAEYLb_UaXsPcBRq4N=h+Ddq03UX438nPxvOr=6CAhmwQuVJWZQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24 April 2012 16:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> If they are in sorted order with an empty string
> appended onto the end, it takes about 25 seconds.

That's exactly what I'd have expected, but was surprised to have not
found with my own test. Perhaps it was same kind of fluke (i.e. a
re-creatable one - I'm quite confident that my benchmark was not
methodologically flawed, at least in execution).

> Based on that, I'm inclined to propose rejiggering things so that the
> presorted-input check runs only at the top level, and not during any
> recursive steps.  The theory is that it won't cost much because if the
> data is randomly ordered we won't do many comparisons before falling
> out, and that seems to be true.  But the only point of doing it in the
> recursive steps is to win when the data is partially ordered, and we
> actually seem to be losing big-time in that case, perhaps because when
> the data is partially ordered, the presort check will frequently to
> run through a significant part of the array - wasting cycles - but
> fall out before it actually gets to the end.

That makes sense to me, but obviously more data is needed here.

> Of course, even if we did that, it might not be as good as your
> timsort numbers, but that doesn't mean we shouldn't do it...

The frustrating thing about my timsort numbers, as I mentioned in
reply to Dimitri (He modified the subject a bit, so that might appear
to be a different thread to you), is that they appear to be almost
consistently winning when you consider the number of comparisons, but
in fact lose when you measure the duration of execution or TPS or
whatever. I expected a certain amount of this, but not enough to
entirely derail the case for replacing quicksort with timsort when
sorting a single key of text, which is obviously the really compelling
case for optimisation here. This situation is only going to be made
"worse" by the work you've done on SortSupport for text, which,
incidentally, I agree is worthwhile.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-04-24 17:06:06 Welcome 2012 GSOC students
Previous Message Greg Stark 2012-04-24 16:30:27 Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)