Re: [SQL] OFFSET impact on Performance???

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [SQL] OFFSET impact on Performance???
Date: 2005-01-27 05:10:24
Message-ID: m38y6fqwxr.fsf@knuth.knuth.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

In an attempt to throw the authorities off his trail, merlin(dot)moncure(at)rcsonline(dot)com ("Merlin Moncure") transmitted:
> Alex wrote:
>> How do you create a temporary view that has only a small subset of the
>> data from the DB init? (Links to docs are fine - I can read ;). My
>> query isn't all that complex, and my number of records might be from
>> 10 to 2k depending on how I implement it.
>
> Well, you can't. My point was that the traditional query/view
> approach is often more appropriate for these cases.

Actually, you can if you assume you can "temporarily materialize" that
view.

You take the initial query and materialize it into a temporary table
which can then be used to browse "detail."

Thus, suppose you've got a case where the selection criteria draw in
8000 objects/transactions, of which you only want to fit 20/page.

It's ugly and slow to process the 15th page, and you essentially
reprocess the whole set from scratch each time:

select [details] from [big table] where [criteria]
order by [something]
offset 280 limit 20;

Instead, you might start out by doing:

select [key fields] into temp table my_query
from [big table] where [criteria];

create index my_query_idx on my_query(interesting fields);

With 8000 records, the number of pages in the table will correspond
roughly to the number of bytes per record which is probably pretty
small.

Then, you use a join on my_query to pull the bits you want:

select [big table.details] from [big table],
[select * from my_query order by [something] offset 280 limit 20]
where [join criteria between my_query and big table]
order by [something];

For this to be fast is predicated on my_query being compact, but that
should surely be so.

The big table may be 20 million records; for the result set to be even
vaguely browsable means that my_query ought to be relatively small so
you can pull subsets reasonably efficiently.

This actually has a merit over looking at a dynamic, possibly-changing
big table that you won't unexpectedly see the result set changing
size.

This strikes me as a pretty slick way to handle "data warehouse-style"
browsing...
--
output = ("cbbrowne" "@" "gmail.com")
http://www.ntlug.org/~cbbrowne/oses.html
The first cup of coffee recapitulates phylogeny.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Dustin Sallings 2005-01-27 08:02:29 Re: SQL Performance Guidelines
Previous Message Alex Turner 2005-01-27 04:42:21 Re: [SQL] OFFSET impact on Performance???