Re: LIMIT/SORT optimization

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "Gregory Stark" <gsstark(at)mit(dot)edu>, "pgsql-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: LIMIT/SORT optimization
Date: 2007-04-07 22:23:06
Message-ID: 1175984587.3623.594.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

On Sat, 2007-04-07 at 14:11 -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > I did some performance testing of the patch, and the results were good.
> > I did this:
>
> > test=> CREATE TABLE test (x INTEGER);
> > test=> INSERT INTO test SELECT * FROM generate_series(1, 1000000);
> > test=> SET log_min_duration_statement = 0;
> > test=> SELECT * FROM test ORDER BY x LIMIT 3;
>
> LIMIT 3 seems an awfully favorable case; if the patch can only manage a
> factor of 4 speedup there, what happens at limit 10, 20, 100? Also,
> you've tested only one sort size and only one (unspecified) value of
> work_mem, and the usefulness of the patch would surely vary depending on
> that. In particular, what happens with a LIMIT large enough to overflow
> work_mem?

Yeh, this is really designed to improve the case where we retrieve a
"screenfull" of data. i.e. 25, 50 or 100 records. Or worst case 10
screenfulls.

The code deliberately doesn't use an insertion sort for that reason,
since that is beyond the cut-off where that works best. So it should be
optimised for medium numbers of rows when no index is present.

The use case is important because we want to be able to populate data
for screens in a reasonably bounded time, not one that gets suddenly
worse should the number of possible matches exceed work_mem. [Think how
well Google reacts to varying numbers of candidate matches] Whatever
happens with LIMIT > work_mem doesn't fit the use case, so as long as it
is no slower than what we have now, that should be fine.

> Lastly, I suspect that sorting presorted input might be particularly
> favorable for this patch. Please try it with random data for comparison.

Agreed.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2007-04-08 00:24:26 Re: simply custom variables protection
Previous Message Bruce Momjian 2007-04-07 22:19:07 Re: Heap page diagnostic/test functions (v2)