Re: I: About "Our CLUSTER implementation is pessimal" patch

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-10-02 00:36:36
Message-ID: AANLkTin1gwabNHp7ViNNMoKM+Utr53WTJH7g=2HvYTWL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> I ran a few more performance tests on this patch. Here's what  I got
>> for the tests Leonardo posted originally:
>>    * 2M  rows:  22 seconds for seq. scan, 24 seconds for index scan
>>    * 5M  rows:  139 seconds for seq. scan, 97 seconds for index scan
>>    *  10M rows: 256 seconds seq. scan, 611 seconds for index scan
>
> I don't have time right now to run more tests, I'll try to make some by
> next week.
>
> Would it mean that doing:
>
> create table p as select * from atable order by akey
>
> (where akey is random distributed)
> with 5M rows is faster with enable_seqscan=0 and
> enable_indexscan=1??? That would be weird, especially on a
> laptop hard drive! (assuming there's a reasonable amount of
> memory set in work_mem/maintenance_work_mem)

Hrm, this is interesting. I set up a test table with 5M rows like so:

CREATE TABLE atable (
akey int
);
INSERT INTO atable (akey)
SELECT (RANDOM() * 100000)::int FROM generate_series(1,5000000);
CREATE INDEX akey_idx ON atable(akey);
ANALYZE atable;

And then I tested table creation times. First, using a normal:

BEGIN;
SET enable_seqscan = on;
SET enable_indexscan = on;
EXPLAIN ANALYZE CREATE TABLE idxscanned AS SELECT * FROM atable
ORDER BY akey;
ROLLBACK;

and I get:
Index Scan using akey_idx on atable
(cost=0.00..218347.89 rows=5000000 width=4)
(actual time=0.058..23612.020 rows=5000000 loops=1)
Total runtime: 33029.884 ms

Then, I tried forcing a sequential scan by changing "SET
enable_indexscan = off;", and it's significantly faster, as I would
expect:

Sort (cost=696823.42..709323.42 rows=5000000 width=4)
(actual time=8664.699..13533.131 rows=5000000 loops=1)
Sort Key: akey
Sort Method: external merge Disk: 68304kB
-> Seq Scan on atable (cost=0.00..72124.00 rows=5000000 width=4)
(actual time=0.012..838.092 rows=5000000 loops=1)
Total runtime: 21015.501 ms

I've ran both of these several times, and get 30-32 seconds for the
index scan and 20-21 seconds for the seq. scan each time.

My seq_page_cost and random_page_cost were left at the defaults for
these tests. Oddly, I tried turning seq_page_cost down to 0.01 and
EXPLAIN ANALYZE told me that an index scan was still being chosen. Is
there maybe some other setting I'm forgetting?

Josh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2010-10-02 02:57:50 Re: array_agg() NULL Handling
Previous Message Jaime Casanova 2010-10-01 22:52:50 Re: recovery.conf location