We have a very large table (150 million rows) where the rows are not in a
random order. Some common queries will have millions of results, and this
of course is slow. For an interactive interface to the database, we want to
put a limit on all queries so that queries will return quickly even if there
are millions of results.
The problem we are seeing at the moment is that the statistics histograms
are actually often hurting us. Postgres sees that for some queries there
are so many results that it decides to do a sequential scan, rather than
using the index. Unfortunately, sometimes all of these millions of results
are at the end of the table, rather than being randomly distributed in the
table, so a sequential scan is the worst possible approach.
To fix this, we could use an "order by" clause to force Postgres to use the
index, but this isn't so easy as we are using Postgres through an ORM (i.e,
Django in this case), and it's not so easy to try to jury-rig things this
way on a per-query basis.
Alternatively, we could randomize the ordering of the table rows, but that
seems like an awfully big hammer, and will also prevent us from ordering the
table on a specific index, which is different from the index in question,
but might be correlated with it.
Is there a way of telling Postgres not to assume that the table rows are in
a random order, so that Postgres won't make the mistake of ever doing a
sequence scan on a huge table when there is a small limit?
pgsql-general by date
|Next:||From: Dave Page||Date: 2009-05-27 23:55:49|
|Subject: Re: vista failed to install postgresql|
|Previous:||From: zxo102 ouyang||Date: 2009-05-27 23:40:00|
|Subject: Re: How to speed up the first-time-searching in pgsql?|