Highly Efficient Custom Sorting

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Highly Efficient Custom Sorting
Date: 2010-07-02 00:46:13
Message-ID: AANLkTikrA8M3cncIHkFqsJ78TzCnQdriE0GjvK8XjB9E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I have a long stored procedure (over 3,000 lines). Originally, it would take
about 44ms to run the whole query. After lots and lots of tweaking, Postgres
now runs the entire thing and gathers my results in just 15.2ms, which is
very impressive given the hardware this is running on. Now, I used to return
the results unsorted to my C++ backend and then sort them there using my
custom sort order which provides prioritized, weighted random ordering with
4 different priority fields and 3 different weighting fields within 3 of
those 4 priority fields. Needless to say, the sorting is quite complex. I
wanted to cut down on the amount of data being delivered to my C++ backend,
so I am using the stored procedure to insert a summary of my results
directly into the database, which is far more efficient than dumping it all
to the C++ backend (including stuff that isn't really needed there) and then
dumping it all back to Postgres via INSERTS later in the execution path. The
problem is that I want the results sorted in this custom order before they
are stored in the database. (By sorted, I mean I want to include a field
that lists a numerical order for the set of results.) Thus, I used to dump
everything to the C++ program, perform the sorting, then INSERT back to
Postgres. This was obviously not all that efficient. Now, the sorting in C++
took <1ms to accomplish. When I re-wrote the sorting in pl/pgsql using a
couple of additional stored procedures, I discovered it is taking 15.2ms to
perform the sort of the records within Postgres. This almost cancels out all
of the prior optimizations I previously performed:

T:20100702001841+0903010 TID:0x43945940 INFO:NOTICE: Sorting group ...
<snip>
...
</snip>
T:20100702001841+0917683 TID:0x43945940 INFO:NOTICE: Sorting 1 rows in
priority 1... <-- Last sort item
T:20100702001841+0918292 TID:0x43945940 INFO:NOTICE:

918,292 - 903,010 = 15,282 us = 15.282 ms

So, the bottom line is, I need a faster way to do this sorting.

What options are at my disposal for improving the speed and efficiency of
this sorting? Which is the easiest to implement? What are the drawbacks of
each different method?

Thanks in advance for your insights.

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Bruce Momjian 2010-07-02 03:00:00 Re: No hash join across partitioned tables?
Previous Message Michal Fapso 2010-07-01 22:34:53 big data - slow select (speech search)