Skip site navigation (1) Skip section navigation (2)

WIP: further sorting speedup

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: WIP: further sorting speedup
Date: 2006-02-20 02:40:46
Message-ID: 15464.1140403246@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-patches
After applying Simon's recent sort patch, I was doing some profiling and
noticed that sorting spends an unreasonably large fraction of its time
extracting datums from tuples (heap_getattr or index_getattr).  The
attached patch does something about this by pulling out the leading sort
column of a tuple when it is received by the sort code or re-read from a
"tape".  This increases the space needed by 8 or 12 bytes (depending on
sizeof(Datum)) per in-memory tuple, but doesn't cost anything as far as
the on-disk representation goes.  The effort needed to extract the datum
at this point is well repaid because the tuple will normally undergo
multiple comparisons while it remains in memory.  In some quick tests
the patch seemed to make for a significant speedup, on the order of 30%,
despite increasing the number of runs emitted because of the smaller
available memory.

The choice to pull out just the leading column, rather than all columns,
is driven by concerns of (a) code complexity and (b) memory space.
Having the extra columns pre-extracted wouldn't buy anything anyway
in the common case where the leading key determines the result of
a comparison.

This is still WIP because it leaks memory intra-query (I need to fix it
to clean up palloc'd space better).  I thought I'd post it now in case
anyone wants to try some measurements for their own favorite test cases.
In particular it would be interesting to see what happens for a
multi-column sort with lots of duplicated keys in the first column,
which is the case where the least advantage would be gained.

Comments?

			regards, tom lane


Attachment: sort-leading-keys.patch
Description: application/octet-stream (61.0 KB)

Responses

pgsql-patches by date

Next:From: James William PyeDate: 2006-02-20 02:46:18
Subject: Re: ScanDirections
Previous:From: Tom LaneDate: 2006-02-20 02:04:09
Subject: Re: ScanDirections

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group