Quick Links

Re: I: About "Our CLUSTER implementation is pessimal" patch

From:	Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To:	Leonardo Francalanci <m_lists(at)yahoo(dot)it>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: I: About "Our CLUSTER implementation is pessimal" patch
Date:	2010-10-02 00:36:36
Message-ID:	AANLkTin1gwabNHp7ViNNMoKM+Utr53WTJH7g=2HvYTWL@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 1, 2010 at 4:33 AM, Leonardo Francalanci <m_lists(at)yahoo(dot)it> wrote:
>> I ran a few more performance tests on this patch. Here's what I got
>> for the tests Leonardo posted originally:
>> * 2M rows: 22 seconds for seq. scan, 24 seconds for index scan
>> * 5M rows: 139 seconds for seq. scan, 97 seconds for index scan
>> * 10M rows: 256 seconds seq. scan, 611 seconds for index scan
>
> I don't have time right now to run more tests, I'll try to make some by
> next week.
>
> Would it mean that doing:
>
> create table p as select * from atable order by akey
>
> (where akey is random distributed)
> with 5M rows is faster with enable_seqscan=0 and
> enable_indexscan=1??? That would be weird, especially on a
> laptop hard drive! (assuming there's a reasonable amount of
> memory set in work_mem/maintenance_work_mem)

Hrm, this is interesting. I set up a test table with 5M rows like so:

CREATE TABLE atable (
akey int
);
INSERT INTO atable (akey)
SELECT (RANDOM() * 100000)::int FROM generate_series(1,5000000);
CREATE INDEX akey_idx ON atable(akey);
ANALYZE atable;

And then I tested table creation times. First, using a normal:

BEGIN;
SET enable_seqscan = on;
SET enable_indexscan = on;
EXPLAIN ANALYZE CREATE TABLE idxscanned AS SELECT * FROM atable
ORDER BY akey;
ROLLBACK;

and I get:
Index Scan using akey_idx on atable
(cost=0.00..218347.89 rows=5000000 width=4)
(actual time=0.058..23612.020 rows=5000000 loops=1)
Total runtime: 33029.884 ms

Then, I tried forcing a sequential scan by changing "SET
enable_indexscan = off;", and it's significantly faster, as I would
expect:

Sort (cost=696823.42..709323.42 rows=5000000 width=4)
(actual time=8664.699..13533.131 rows=5000000 loops=1)
Sort Key: akey
Sort Method: external merge Disk: 68304kB
-> Seq Scan on atable (cost=0.00..72124.00 rows=5000000 width=4)
(actual time=0.012..838.092 rows=5000000 loops=1)
Total runtime: 21015.501 ms

I've ran both of these several times, and get 30-32 seconds for the
index scan and 20-21 seconds for the seq. scan each time.

My seq_page_cost and random_page_cost were left at the defaults for
these tests. Oddly, I tried turning seq_page_cost down to 0.01 and
EXPLAIN ANALYZE told me that an index scan was still being chosen. Is
there maybe some other setting I'm forgetting?

Josh

In response to

Re: I: About "Our CLUSTER implementation is pessimal" patch at 2010-10-01 08:33:18 from Leonardo Francalanci

Responses

Re: I: About "Our CLUSTER implementation is pessimal" patch at 2010-10-04 03:45:02 from Itagaki Takahiro
Re: I: About "Our CLUSTER implementation is pessimal" patch at 2010-10-04 12:42:11 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David E. Wheeler	2010-10-02 02:57:50	Re: array_agg() NULL Handling
Previous Message	Jaime Casanova	2010-10-01 22:52:50	Re: recovery.conf location