Re: limit clause produces wrong query plan

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Andrus <kobruleht2(at)hot(dot)ee>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: limit clause produces wrong query plan
Date: 2008-11-24 23:10:08
Message-ID: BDFBB77C9E07BE4A984DAAE981D19F961ACA1F1765@EXVMBX018-1.exch018.msoutlookonline.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I believe his earlier (original) complaint was that it was slower with the LIMIT than with no limit. As in, the select (*) query was faster to get the whole thing than apply the limit. Wherever that is the case, it is broken.

Certainly a complex join makes this more difficult, but one would agree that with a LIMIT it should never be slower than without, right?

Any LIMIT combined with an ORDER by at the last stage can't be completed without fetching every row and sorting. If there is an index on the column being sorted at each table in the join it is possible to short-circuit the plan after enough result rows are found, but this would have to iterate on each arm of the join until there were enough matches, and postgres does not have any such iterative query strategy that I'm aware of.
For a single large, indexed table with no joins it certainly makes sense to terminate the index scan early.

Often, one is better off with explicit subselects for the join branches with explicit LIMIT on each of those (oversized to compensate for the likelihood of a match) but you have to be willing to get less than the LIMIT ammount if there are not enough matches, and that is usually bad for 'paging' style queries rather than 'report on top X' queries where the premature truncation is probably ok and is enough to make the query fast. But if you have very large datasets on the query branch arms, asking for only 10K of 300K rows to sort and then match against 10K in another arm, and find 500 items, can be a huge gain versus doing the entire thing.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Scott Marlowe
Sent: Monday, November 24, 2008 11:23 AM
To: Andrus
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] limit clause produces wrong query plan

On Mon, Nov 24, 2008 at 10:26 AM, Andrus <kobruleht2(at)hot(dot)ee> wrote:
>> it was veery fast. To be honest I do not know what is happening?!
>
> This is really weird.
> It seems that PostgreSql OFFSET / LIMIT are not optimized and thus typical
> paging queries

And how exactly should it be optimized? If a query is even moderately
interesting, with a few joins and a where clause, postgresql HAS to
create the rows that come before your offset in order to assure that
it's giving you the right rows.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET pageno*100 LIMIT 100

This will get progressively slower as pageno goes up.

> SELECT ... FROM bigtable ORDER BY intprimarykey OFFSET 0 LIMIT 100

That should be plenty fast.

> cannot be used in PostgreSql at all for big tables.

Can't be used in any real database with any real complexity to its query either.

> Do you have any idea how to fix this ?

A standard workaround is to use some kind of sequential, or nearly so,
id field, and then use between on that field.

select * from table where idfield between x and x+100;

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2008-11-25 04:34:03 Re: Monitoring buffercache...
Previous Message PFC 2008-11-24 22:58:50 Re: limit clause produces wrong query plan