Re: What is the right way to deal with a table with rows that are not in a random order?

From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Douglas Alan <darkwater42(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: What is the right way to deal with a table with rows that are not in a random order?
Date: 2009-05-28 19:53:53
Message-ID: dcc563d10905281253v15452738sdedca6422951ae66@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, May 28, 2009 at 1:45 PM, Douglas Alan <darkwater42(at)gmail(dot)com> wrote:
> On Thu, May 28, 2009 at 10:41 AM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
> wrote:
>
>> Note that in the OPs case I'd probably try testing things like turning
>> off seqscan, or lowering random_page_cost.  I'd also look at
>> clustering on the index for the field you're selecting on.
>
> I'm worried that turning off seqscan would distort other queries.
>  (Remember, I don't have control of the individual queries.  The user of the
> application can specify all sorts of queries, and there's an ORM in the
> middle.)

You are aware you can turn off seq scans for just the current connection, right?

set enable_seqscan=off;

> In order to force Postgres to do an index scan for this query, I had to set
> random_page_cost to 0.  Even 0.1 was not small enough.  Alternatively, I
> could set seq_page_cost to 39.  (38 was not big enough.)  Again, I'm worried
> that by using such a big hammer, I would distort Postgres's query planning
> for other queries.

No one's saying to do it all the time. They're saying to do it and
then run explain analyze on your query, then post the results of both
let us have a look.

> P.S. Here is the actual data that some people have been clamoring for:

No, it's not. It's just explain output. Unless the actual query
running takes hours, it's much more useful to have explain analyze
output. With the analyze part, it's only showing what the planner
expects, not what actually happens.

> Slow query:
>
> psql> select * from maindb_astobject
>      where survey_id = 2
>      limit 20 offset 1000;

You'll notice that in your previous posts you never mentioned offset,
which greatly affects the plan chosen.

Is survey_id indexed?

> Time: 18073.691 ms
>
> Here's the explanation:
>
>                                     QUERY PLAN
>
> -------------------------------------------------------------------------------------
> Limit  (cost=47.99..48.95 rows=20 width=153)
>   ->  Seq Scan on maindb_astobject  (cost=0.00..3538556.10 rows=73736478
> width=153)
>         Filter: (survey_id = 2)
>
> The same query made fast by removing Postgres's ability to know a priori
> what particular value is being searched for:
>
> psql> select * from maindb_astobject join maindb_enumentity
>      on maindb_astobject.survey_id = maindb_enumentity.id
>      where entityname = 'MACHO'
>      limit 20 offet 1000;

That's not the same query.

> Time: 1.638 ms
>
> Here's the explanation for the above query showing that it is using the
> index:

So, can we see explain analyze?

> QUERY PLAN
>   --------------------------------------------------------------------------
>    Limit  (cost=164.97..168.27 rows=20 width=215)
>      ->  Nested Loop  (cost=0.00..1233523.72 rows=7477081 width=215)
>    ->  Seq Scan on maindb_enumentity  (cost=0.00..1.12 rows=1 width=62)
>  Filter: ((entityname)::text = 'MACHO'::text)
>    ->  Index Scan using maindb_astobject_survey_id on maindb_astobject
>  (cost=0.00..1046595.57 rows=14954162 width=153)
>  Index Cond: (maindb_astobject.survey_id = maindb_enumentity.id)
>
>
> P.P.S. Many other queries were very slow due to Postgres wanting to use hash
> joins and merge joins rather than nested index scans.

Then it's quite possible you have a problem with misestimation of
values in your db. Since we have no explain ANALYZE output, we cannot
judge if this is the case.

> The nested index
> scans were, in fact, much faster.  I just turned off hash joins and merge
> joins:  problem solved.

One problem solved, another one created is more likely.

> It might be nice at some point to figure out what
> is going with Postgres trying to use these fancier joins that turn out to be
> much slower for us, but that's a worry for another day, since I have a
> perfectly good work-around at the moment.

Yes, let us see explain analyze output and maybe we can help. You
know, the thing that was asked for at the beginning.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Daniel Verite 2009-05-28 19:56:31 Re: ubuntu 9.04 and auto-start
Previous Message Scott Marlowe 2009-05-28 19:47:04 Re: What is the right way to deal with a table with rows that are not in a random order?