Re: What is the right way to deal with a table with rows that are not in a random order?

From: Douglas Alan <darkwater42(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject: Re: What is the right way to deal with a table with rows that are not in a random order?
Date: 2009-05-29 20:12:46
Message-ID: ce6334d00905291312t7e6f2c79kc9bd40ab65538580@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Scott Marlowe wrote:

> Douglas Alan wrote:

>> Okay -- no problem:
>>
>> set enable_seqscan = on;
>> explain analyze select * from maindb_astobject
>> where survey_id = 2
>> limit 1000;
>>
>> "Limit (cost=0.00..48.03 rows=1000 width=78) (actual
>> time=84837.835..265938.258 rows=1000 loops=1)"
>> " -> Seq Scan on maindb_astobject (cost=0.00..3538556.10
>> rows=73675167 width=78) (actual time=84837.825..265932.121 rows=1000
>> loops=1)"
>> " Filter: (survey_id = 2)"
>> "Total runtime: 265942.416 ms"
>>
>> set enable_seqscan = off;
>> explain analyze select * from maindb_astobject
>> where survey_id = 2
>> limit 1000;
>>
>> "Limit (cost=0.00..67.37 rows=1000 width=78) (actual
>> time=172.248..225.219 rows=1000 loops=1)"
>> " -> Index Scan using maindb_astobject_survey_id on
>> maindb_astobject (cost=0.00..4963500.87 rows=73675167 width=78)
>> (actual time=172.240..221.078 rows=1000 loops=1)"
>> " Index Cond: (survey_id = 2)"
>> "Total runtime: 227.412 ms"

> What was the random_page_cost during these two queries?

4

> Assuming seq_page_cost is 1,

Yes, it is.

> and random_page_cost was 4 or something, lowering it should
> force the move to an index scan.

I just tried changing random_page_cost to 1, but the query still
does a seq scan.

> If you were already at 1.0 or so, then yeah, the cost
> estimation is off. Since index scans cost CPU (at least I
> think they do), you might try lowering your cpu_* costs to see
> if that helps

How would lowering random_page_cost and all the cpu costs differ
from just increasing seq_page cost?

I have to raise seq_page_cost from 1 to 34 to force an index
scan. I can't imagine that changing this value so radically be a
good idea.

Alternatively, if I set random_page_cost to 1, and
cpu_tuple_cost, cpu_index_tuple_cost, and cpu_operator_cost all
to 0, this still doesn't cause Postgres to do an index scan for
this query.

>> P.S. Here are the stats on the column. It appears that my recollection
>> of 25% of the table matching was a bit off. It's actually 98.5%! That
>> might explain more why Postgres wants to do a sequential scan. The
>> problem is that still means that it has to scan a million rows
>> sequentially before it finds a single matching row, as the matching
>> rows are at the end of the database:

> Yeah, that's a really skewed distribution. Partitioning may work out,
> especially if you often select on that one field.

Is there a way for me to alter the statistics table? I tried
changing the values in pg_stats, but that table is just a view,
so Postgres won't let me do it. pg_statistic, on the other hand,
is rather opaque.

Alternatively, can I delete the statistics for the column. It's
the statistics that are hurting me here.

To delete the statistics, I tried setting statistics for the
column to 0 and analyzing the column, but that just left the
current statistics in place. So I tried setting statistics to 1,
but that's one value to many to eliminate this problem!

|>ouglas

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adam Ruth 2009-05-29 20:18:16 Re: Converting each item in array to a query result row
Previous Message Kris Jurka 2009-05-29 18:29:58 Re: Pl/java in 8.4 bet1 sources compilation failed