Re: What is the right way to deal with a table with rows that are not in a random order?

From: Douglas Alan <darkwater42(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: What is the right way to deal with a table with rows that are not in a random order?
Date: 2009-05-28 19:45:22
Message-ID: ce6334d00905281245l288c203p7d5a9d3b91718596@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, May 28, 2009 at 10:41 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
wrote:

> Note that in the OPs case I'd probably try testing things like turning
> off seqscan, or lowering random_page_cost. I'd also look at
> clustering on the index for the field you're selecting on.

I'm worried that turning off seqscan would distort other queries.
(Remember, I don't have control of the individual queries. The user of the
application can specify all sorts of queries, and there's an ORM in the
middle.)

In order to force Postgres to do an index scan for this query, I had to set
random_page_cost to 0. Even 0.1 was not small enough. Alternatively, I
could set seq_page_cost to 39. (38 was not big enough.) Again, I'm worried
that by using such a big hammer, I would distort Postgres's query planning
for other queries.

|>ouglas

P.S. Here is the actual data that some people have been clamoring for:

Slow query:

psql> select * from maindb_astobject
where survey_id = 2
limit 20 offset 1000;

Time: 18073.691 ms

Here's the explanation:

QUERY PLAN

-------------------------------------------------------------------------------------
Limit (cost=47.99..48.95 rows=20 width=153)
-> Seq Scan on maindb_astobject (cost=0.00..3538556.10 rows=73736478
width=153)
Filter: (survey_id = 2)

The same query made fast by removing Postgres's ability to know a priori
what particular value is being searched for:

psql> select * from maindb_astobject join maindb_enumentity
on maindb_astobject.survey_id = maindb_enumentity.id
where entityname = 'MACHO'
limit 20 offet 1000;

Time: 1.638 ms

Here's the explanation for the above query showing that it is using the
index:

QUERY PLAN
--------------------------------------------------------------------------
Limit (cost=164.97..168.27 rows=20 width=215)
-> Nested Loop (cost=0.00..1233523.72 rows=7477081 width=215)
-> Seq Scan on maindb_enumentity (cost=0.00..1.12 rows=1 width=62)
Filter: ((entityname)::text = 'MACHO'::text)
-> Index Scan using maindb_astobject_survey_id on maindb_astobject
(cost=0.00..1046595.57 rows=14954162 width=153)
Index Cond: (maindb_astobject.survey_id = maindb_enumentity.id)

P.P.S. Many other queries were very slow due to Postgres wanting to use hash
joins and merge joins rather than nested index scans. The nested index
scans were, in fact, much faster. I just turned off hash joins and merge
joins: problem solved. It might be nice at some point to figure out what
is going with Postgres trying to use these fancier joins that turn out to be
much slower for us, but that's a worry for another day, since I have a
perfectly good work-around at the moment.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scott Marlowe 2009-05-28 19:47:04 Re: What is the right way to deal with a table with rows that are not in a random order?
Previous Message Osvaldo Kussama 2009-05-28 19:42:42 Re: How to pass parameters into a sql script ?