Re: I: About "Our CLUSTER implementation is pessimal" patch

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch
Date: 2010-10-04 12:42:11
Message-ID: D3F8007A-9FB5-451A-B209-348E138172EF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Oct 1, 2010, at 8:36 PM, Josh Kupershmidt <schmiddy(at)gmail(dot)com> wrote:
> On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>>> I ran a few more performance tests on this patch. Here's what I got
>>> for the tests Leonardo posted originally:
>>> * 2M rows: 22 seconds for seq. scan, 24 seconds for index scan
>>> * 5M rows: 139 seconds for seq. scan, 97 seconds for index scan
>>> * 10M rows: 256 seconds seq. scan, 611 seconds for index scan
>>
>> I don't have time right now to run more tests, I'll try to make some by
>> next week.
>>
>> Would it mean that doing:
>>
>> create table p as select * from atable order by akey
>>
>> (where akey is random distributed)
>> with 5M rows is faster with enable_seqscan=0 and
>> enable_indexscan=1??? That would be weird, especially on a
>> laptop hard drive! (assuming there's a reasonable amount of
>> memory set in work_mem/maintenance_work_mem)
>
> Hrm, this is interesting. I set up a test table with 5M rows like so:
>
> CREATE TABLE atable (
> akey int
> );
> INSERT INTO atable (akey)
> SELECT (RANDOM() * 100000)::int FROM generate_series(1,5000000);
> CREATE INDEX akey_idx ON atable(akey);
> ANALYZE atable;
>
> And then I tested table creation times. First, using a normal:
>
> BEGIN;
> SET enable_seqscan = on;
> SET enable_indexscan = on;
> EXPLAIN ANALYZE CREATE TABLE idxscanned AS SELECT * FROM atable
> ORDER BY akey;
> ROLLBACK;
>
> and I get:
> Index Scan using akey_idx on atable
> (cost=0.00..218347.89 rows=5000000 width=4)
> (actual time=0.058..23612.020 rows=5000000 loops=1)
> Total runtime: 33029.884 ms
>
> Then, I tried forcing a sequential scan by changing "SET
> enable_indexscan = off;", and it's significantly faster, as I would
> expect:
>
> Sort (cost=696823.42..709323.42 rows=5000000 width=4)
> (actual time=8664.699..13533.131 rows=5000000 loops=1)
> Sort Key: akey
> Sort Method: external merge Disk: 68304kB
> -> Seq Scan on atable (cost=0.00..72124.00 rows=5000000 width=4)
> (actual time=0.012..838.092 rows=5000000 loops=1)
> Total runtime: 21015.501 ms
>
> I've ran both of these several times, and get 30-32 seconds for the
> index scan and 20-21 seconds for the seq. scan each time.
>
> My seq_page_cost and random_page_cost were left at the defaults for
> these tests. Oddly, I tried turning seq_page_cost down to 0.01 and
> EXPLAIN ANALYZE told me that an index scan was still being chosen. Is
> there maybe some other setting I'm forgetting?

Did you also adjust random_page_cost?

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-10-04 12:52:10 Re: streaming replication question
Previous Message Hans-Jürgen Schönig 2010-10-04 12:37:13 Re: ugly locking corner cases ...