Re: What is the right way to deal with a table with rows that are not in a random order?

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Douglas Alan <darkwater42(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: What is the right way to deal with a table with rows that are not in a random order?
Date: 2009-05-28 00:54:27
Message-ID: 1243472067.11796.2.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2009-05-27 at 19:53 -0400, Douglas Alan wrote:
> We have a very large table (150 million rows) where the rows are not
> in a random order. Some common queries will have millions of results,
> and this of course is slow. For an interactive interface to the
> database, we want to put a limit on all queries so that queries will
> return quickly even if there are millions of results.
>
> The problem we are seeing at the moment is that the statistics
> histograms are actually often hurting us. Postgres sees that for some
> queries there are so many results that it decides to do a sequential
> scan, rather than using the index. Unfortunately, sometimes all of
> these millions of results are at the end of the table, rather than
> being randomly distributed in the table, so a sequential scan is the
> worst possible approach.

If you're putting a LIMIT on it, why does it return millions of results?

> To fix this, we could use an "order by" clause to force Postgres to
> use the index, but this isn't so easy as we are using Postgres through
> an ORM (i.e, Django in this case), and it's not so easy to try to
> jury-rig things this way on a per-query basis.
>

ORDER BY won't always choose an index scan, for instance if the
correlation is low and the expected results are many.

Can you pick out an interesting query and give some specifics, like:
* the query
* the EXPLAIN ANALYZE output (or EXPLAIN without ANALYZE if it takes too
long to even run once)
* EXPLAIN ANALYZE output if you force the index scan
* the statistics for the relevant columns, such as histogram and
correlation

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Douglas Alan 2009-05-28 02:43:20 Re: What is the right way to deal with a table with rows that are not in a random order?
Previous Message Tom Lane 2009-05-28 00:23:27 Re: Bloated Table