Re: query with offset stops using index scan

From: "Stanislav Raskin" <sr(at)brainswell(dot)de>
To: "'Scott Marlowe'" <scott(dot)marlowe(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: query with offset stops using index scan
Date: 2008-08-31 16:04:52
Message-ID: E1KZpQG-0002AE-00@teena.zerebecki.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> If there's a chance to upgrade to 8.3 please do so.

I am aware of the benefits with 8.3, but such an upgrade would require quite
some changes in our application, including introduction of explicit casting
mechanisms. We are going to do so sooner or later, but right now we need to
focus on other stuff.

> What's happening here is that the query
> planner is switching plans because it thinks the sequential scan and
> sort are cheaper. and at some point it will likely be right.

Thank you very much for pointing out the issue.
I am still a bit puzzled, because there are only about 2000 data sets in
this table. It's not like we're handling millions of rows.

Clustering on the id did the trick. Now the planner always chooses to use
the index.

Unfortunately, I cannot use multi-column indexes here, because the
expressions in the WHERE statement can vary quite strongly, depending on
user input.

Cursors are not an option, because it is a web application, meaning that I
have to use a new connection for basically every HTTP request.

The "where id between x and x+y" is indeed much, much faster, but it
presumes the knowledge of x and y, which is not the case, because serial ids
are not necessarily continuous (i.e. if some data sets were deleted).

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:scott(dot)marlowe(at)gmail(dot)com]
Gesendet: Sonntag, 31. August 2008 17:26
An: Stanislav Raskin
Cc: pgsql-general(at)postgresql(dot)org
Betreff: Re: [GENERAL] query with offset stops using index scan

On Sun, Aug 31, 2008 at 7:14 AM, Stanislav Raskin <sr(at)brainswell(dot)de> wrote:
> Hello everybody,
>
> Now, if I increase OFFSET slowly, it works all the same way, until OFFSET
> reaches the value of 750. Then, the planner refuses to use an index scan
and
> does a plain seq scan+sort, which makes the query about 10-20 times
slower:
>
> I use 8.1.4, and I did a vacuum full analyze before running the queries.

If there's a chance to upgrade to 8.3 please do so. While 8.1 was a
solid reliable workhorse of a database, there's been a lot of work
done in general for better performance and more features. It likely
won't fix this one problem, but it's often smarter about corner cases
in query plans than 8.1 so it's worth looking into.

Now back to your problem. What's happening here is that the query
planner is switching plans because it thinks the sequential scan and
sort are cheaper. and at some point it will likely be right. That's
because a random page cost is much higher than a sequential page cost.
So at some point, say when you're grabbing 2% to 25% of a table, it
will switch to sequential scans.

Now, if the data is all cached, then it's still quicker to do the
index scan further along than to use a seq scan and a sort. Unless
your table is clustered to the index you're sorting on, a Seq scan
will almost always win if you need the whole table.

However, you may be in a position where a multi-column index and
clustering on id will allow you to run this offset higher. It's still
a poor performer for large chunks of large tables.

first cluster on the primary key id, then create a three column index
for (active, valid_until, locked) Note that the order should be from
the most choosey to least choosey column, generally. So assuming only
a tiny percentage of records meet valid_until, make it the first
column, and so forth. A query like:

select active, count(active) from table group by active;

will give you an idea there.

In the long run if you want good performance on larger data sets (i.e.
higher offset numbers) you'll likely need to switch to either cursors,
or using "where id between x and x+y" or lookup tables, or something
like that.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Lew 2008-08-31 16:06:53 Re: ERROR: relation . . . does not exist
Previous Message Scott Marlowe 2008-08-31 15:25:40 Re: query with offset stops using index scan