Re: SeqScan costs

From: Decibel! <decibel(at)decibel(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SeqScan costs
Date: 2008-08-13 16:49:10
Message-ID: EB312F7D-78CC-4034-9871-AC0A419238D5@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
>>> Proposal: Make the first block of a seq scan cost
>>> random_page_cost, then after that every additional block costs
>>> seq_page_cost.
>
> ?Tom> This is only going to matter for a table of 1 block (or at least
> ?Tom> very few blocks), and for such a table it's highly likely that
> ?Tom> it's in RAM anyway.? So I'm unconvinced that the proposed change
> ?Tom> represents a better model of reality.
>
> Simple example which demonstrates a 10x speed improvement for index
> scan over seqscan for a 1-block table (on 8.3.3):
>
> create table oneblock (id integer primary key, value text not null);?
> insert into oneblock select i, 'row ' || i from generate_series
> (1,200) i;
>
> test=> select pg_relation_size('oneblock');
> ?pg_relation_size?
> ------------------
> ?? ? ? ? ? ? 8192
>
> analyze oneblock;
>
> set enable_seqscan=true;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 25596.709 ms? (that's 25.6 us per row)
>
> set enable_seqscan=false;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 2415.691 ms ? (that's 2.4 us per row)

Roughly what I get on my MBP (I'm about a factor of 2 slower). This
makes me think it's an issue of having to slog through an entire page
one row at a time vs just using the index. To test this I tested
selecting i=200 (remember we start filling data at the back of the
page, so 200 would actually be the front, and I'm assuming the first
value that would be hit) vs i=1. With seqscans, I saw about a 10%
difference. With index scans the difference was moot, but also note
that now index scans are in-between seqscans in performance.

decibel(at)platter(dot)local=# explain analyze select (select value from
oneblock where id = 200)
from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Nested Loop (cost=17.00..20029.50 rows=1000000 width=0) (actual
time=270.867..65821.373 rows=100000000 loops=1)
InitPlan
-> Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7)
(actual time=0.052..0.053 rows=1 loops=1)
Filter: (id = 200)
-> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1)
-> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual
time=1.368..164.634 rows=500000 loops=200)
-> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=270.743..459.335 rows=500000 loops=1)
Total runtime: 79055.822 ms
(8 rows)

decibel(at)platter(dot)local=# explain analyze select (select value from
oneblock where id = 1)
from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Nested Loop (cost=17.00..20029.50 rows=1000000 width=0) (actual
time=261.941..72937.226 rows=100000000 loops=1)
InitPlan
-> Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7)
(actual time=0.025..0.056 rows=1 loops=1)
Filter: (id = 1)
-> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1)
-> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual
time=1.375..182.474 rows=500000 loops=200)
-> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=261.815..448.652 rows=500000 loops=1)
Total runtime: 87702.315 ms
(8 rows)

decibel(at)platter(dot)local=# set enable_seqscan =off;
SET
decibel(at)platter(dot)local=# explain analyze select (select value from
oneblock where id = 200)
from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Nested Loop (cost=21.77..20034.27 rows=1000000 width=0) (actual
time=262.219..69851.786 rows=100000000 loops=1)
InitPlan
-> Index Scan using oneblock_pkey on oneblock
(cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1
loops=1)
Index Cond: (id = 200)
-> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1)
-> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual
time=1.325..174.314 rows=500000 loops=200)
-> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=262.119..449.383 rows=500000 loops=1)
Total runtime: 83024.952 ms
(8 rows)

decibel(at)platter(dot)local=# explain analyze select (select value from
oneblock where id = 1)
from generate_series(1,200) i, generate_series(1,500000) j;
QUERY
PLAN
------------------------------------------------------------------------
-----------------------------------------------------------------
Nested Loop (cost=21.77..20034.27 rows=1000000 width=0) (actual
time=262.175..68943.985 rows=100000000 loops=1)
InitPlan
-> Index Scan using oneblock_pkey on oneblock
(cost=0.00..8.27 rows=1 width=7) (actual time=0.023..0.025 rows=1
loops=1)
Index Cond: (id = 1)
-> Function Scan on generate_series i (cost=0.00..12.50
rows=1000 width=0) (actual time=0.062..0.339 rows=200 loops=1)
-> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual
time=1.325..176.056 rows=500000 loops=200)
-> Function Scan on generate_series j (cost=0.00..12.50
rows=1000 width=0) (actual time=262.079..454.692 rows=500000 loops=1)
Total runtime: 82598.556 ms
(8 rows)

decibel(at)platter(dot)local=#
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-08-13 16:55:19 Re: Transaction-controlled robustness for replication
Previous Message Simon Riggs 2008-08-13 16:46:18 Re: Transaction-controlled robustness for replication