Re: Improve Seq scan performance

From: "Vladimir Sitnikov" <sitnikov(dot)vladimir(at)gmail(dot)com>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: Lutischán Ferenc <lutischanf(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Improve Seq scan performance
Date: 2008-11-10 07:37:00
Message-ID: 1d709ecc0811092337q74fee9e7vc418eba9c798f3a4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> Lutischán Ferenc wrote:
>
> It is possible to make an index on the table, and make a seq index scan on
>> this values?
>>
>
> My understanding is that this isn't possible in PostgreSQL, because indexes
> do not contain information about tuple visibility. Data read from the index
> might refer to tuples that've been deleted as far as your transaction is
> concerned, or to tuples that were created after your snapshot was taken.
>
> My understanding is even though indices do not contain information on tuple
visibility, index could be used to filter out records that is known to make
no match. Since btree index stores exact values, PostgreSQL could scan
through the index and skip those entries that do not contain '%aaa%'. That
will dramatically improve cases where the criteria has good selectivity,
since index has much more compact structure than table.

As far as I understand, it is discouraged to implement/suggest patches
during Commitfest, however, I would love to treat the following case as a
"performance bug" and add it to the "TODO" list:

create table seq_test
as select cast(i as text) i, repeat('*', 500) padding from
generate_series(1,10000) as s(i);

create index i_ix on seq_test(i);

vacuum analyze verbose seq_test;
-- index "i_ix" now contains 10000 row versions in *30 *pages
-- "seq_test": found 0 removable, 10000 nonremovable row versions in *667 *
pages

explain analyze select * from seq_test where i like '%123%';
-- Seq Scan reads 667 pages (as expected)
Seq Scan on seq_test (cost=0.00..792.00 rows=356 width=508) (actual
time=0.129..9.071 rows=20 loops=1 read_shared=*667*(667) read_local=0(0)
flush=0 local_flush=0 file_read=0 file_write=0)
Filter: (i ~~ '%123%'::text)
Total runtime: 9.188 ms

set enable_seqscan=off
-- Index Scan reads 2529 pages for some reason. I would expect *30 *(index
size) + *20 *(number of matching entries) = 50 pages maximum, that is 10
times better than with seq scan.
Index Scan using i_ix on seq_test (cost=0.00..1643.74 rows=356 width=508)
(actual time=0.334..16.746 rows=*20 *loops=1 read_shared=2529(2529)
read_local=0(0) flush=0 local_flush=0 file_read=0 file_write=0)
Filter: (i ~~ '%123%'::text)
Total runtime: 16.863 ms

Hopefully, there will be a clear distinction between filtering via index and
filtering via table access.

Regards,
Vladimir Sitnikov

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2008-11-10 07:59:45 Re: Improve Seq scan performance
Previous Message Craig Ringer 2008-11-10 06:55:55 Re: Improve Seq scan performance