Skip site navigation (1) Skip section navigation (2)

External Sort performance patch

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: External Sort performance patch
Date: 2006-01-26 00:33:27
Message-ID: 1138235607.3363.48.camel@localhost.localdomain (view raw or whole thread)
Lists: pgsql-patches
The enclosed patch substantially improves large sort performance, in the
general case

cvstip:	elapsed 5693 sec, CPU 196 sec
patch:	elapsed 4132 sec, CPU 90 sec

The patch implements dynamically increasing number of logical tapes when
sufficient memory is available to make that efficient. cvstip runs with
a static number of tapes (7) whereas the patch was able to allocate 104
tapes to the task. This has the effect of almost completely removing
intermediate merge runs and hence the increased performance. 

>From Jeffrey W. Baker's original idea
and followup comments

It is expected this will substantially improve performance for large

The guesstimated default setting of the OPTIMAL_MERGE_BUFFER_SIZE of
262144 means that the default setting of work_mem will still use only 7
tapes, though setting work_mem > 2MB will yield improvements.

Further testing and/or patch comments are requested.

All changes are isolated to src/backend/utils/sort/tuplesort.c
Patch applies cleanly and passes make check on cvstip (though this code
path is not tested in the regression tests anyway).

Test details:
Run the following sort on my laptop (512MB RAM)

postgres=# set work_mem=65536;
Time: 0.801 ms
postgres=# select * from d order by 1,2 limit 1;
 col1 |          col2
    1 | eeeeeeseeeeeeeeeeeeeeee
(1 row)

Time: 4133122.769 ms

postgres=# \d d
       Table "public.d"
 Column |  Type   | Modifiers
 col1   | integer |
 col2   | text    |

postgres=# select count(*) from d;
(1 row)

Time: 248283.128 ms
postgres=# select pg_relation_size('d');
(1 row)

Time: 98.629 ms

trace_sort was enabled for both runs and these are attached as files to
this mail. Test data was anti-sorted, but the ordering of data is not
relevant to the algorithm anyway, except when the data is already almost
perfectly sorted, in which case there is typically only one run anyway.

Best Regards, Simon Riggs

Attachment: cvstip.log
Description: text/x-log (9.6 KB)
Attachment: patch.log
Description: text/x-log (9.1 KB)
Attachment: dynamictapes.patch
Description: text/x-patch (29.7 KB)


pgsql-patches by date

Next:From: Tom LaneDate: 2006-01-26 02:37:06
Subject: Re: [HACKERS] CIDR/INET improvements
Previous:From: Tom LaneDate: 2006-01-25 20:46:08
Subject: Re: Libpq COPY optimization patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2016 The PostgreSQL Global Development Group