Re: Large tables, ORDER BY and sequence/index scans

From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: Milan Zamazal <pdm(at)brailcom(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Large tables, ORDER BY and sequence/index scans
Date: 2010-01-05 12:02:17
Message-ID: 92869e661001050402w2dedeefcw4d90275a2dacb220@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2010/1/5 Milan Zamazal <pdm(at)brailcom(dot)org>

> My problem is that retrieving sorted data from large tables is sometimes
> very slow in PostgreSQL (8.4.1, FWIW).
>
>

> I typically retrieve the data using cursors, to display them in UI:
>
> BEGIN;
> DECLARE ... SELECT ... ORDER BY ...;
> FETCH ...;
> ...
>
> On a newly created table of about 10 million rows the FETCH command
> takes about one minute by default, with additional delay during the
> contingent following COMMIT command. This is because PostgreSQL uses
> sequence scan on the table even when there is an index on the ORDER BY
> column. When I can force PostgreSQL to perform index scan (e.g. by
> setting one of the options enable_seqscan or enable_sort to off), FETCH
> response is immediate.
>
> PostgreSQL manual explains motivation for sequence scans of large tables
> and I can understand the motivation. Nevertheless such behavior leads
> to unacceptably poor performance in my particular case. It is important
> to get first resulting rows quickly, to display them to the user without
> delay.
>
> My questions are:
>
> - What is your experience with using ORDER BY + indexes on large tables?
>

Without a WHERE clause postgres will almost always choose a sequential scan.

> - Is there a way to convince PostgreSQL to use index scans automatically
> in cases where it is much more efficient? I tried using ANALYZE,
> VACUUM and SET STATISTICS, but without success.
>
>
By using cursors you take some responsibility away from the planner.
It has no idea that you want first 100 rows quickly. It just tries to
optimize the whole operation.

> - Is it a good idea to set enable_seqscan or enable_sort to "off"
> globally in my case? Or to set them to "off" just before working with
> large tables? My databases contain short and long tables, often
> connected through REFERENCES or joined into views and many of shorter
> tables serve as codebooks. Can setting one of the parameters to off
> have clearly negative impacts?
>

IMHO, no, no and yes.

>
> - Is there a recommended way to keep indexes in good shape so that the
> performance of initial rows retrievals remains good? The large tables
> are typically append-only tables with a SERIAL primary key.
>
>
Use partitioning.
If that's not possible, REINDEX periodically to avoid sub-optimal btree
layout. But that's just a half-solution.

Thanks for any tips.
>

tips:

1. get rid of cursors, unless you have a strong need for them (eg. seeking
back and forth and updating).

2. switch to "chunked" processing, like this:

SELECT * FROM bigtable ORDER by idxcol LIMIT 1000;
(process the records)
SELECT * FROM bigtable WHERE idxcol > [last idxcol from previous fetch]
ORDER by idxcol LIMIT 1000;
... and so on.

pozdrawiam,
Filip

--
Filip Rembiałkowski
JID,mailto:filip(dot)rembialkowski(at)gmail(dot)com
http://filip.rembialkowski.net/

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2010-01-05 12:12:01 Re: Large tables, ORDER BY and sequence/index scans
Previous Message Craig Ringer 2010-01-05 11:20:41 Re: PostgreSQL Write Performance