Re: Index usage for BYTEA column in OR/IN clause

From: David Garamond <lists(at)zara(dot)6(dot)isreserved(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruno Wolff III <bruno(at)wolff(dot)to>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Index usage for BYTEA column in OR/IN clause
Date: 2004-03-28 18:45:18
Message-ID: 40671D3E.3040806@zara.6.isreserved.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Tom Lane wrote:
> David Garamond <lists(at)zara(dot)6(dot)isreserved(dot)com> writes:
>
>>The table contain +- 1 mil records, all of the actual version of the
>>queries below return < 10 rows, so an index should be used. Using an
>>index scan, the query ran < 100ms. Using seq scan, 2-3 secs. So there is
>>no good reason why a seq scan should be used, especially in a case of
>>b='foo' or b='bar'.
>
> [shrug...] We can't possibly diagnose a bad-plan-choice problem with
> the amount of information you've provided. See
> http://techdocs.postgresql.org/guides/SlowQueryPostingGuidelines

Okay, my bad. How about this:

1. script to create the test table (a 1mil-record table; each record
contains 1-40 random bytes):

#!/usr/bin/ruby -rpostgres
ROWS = 1_000_000
conn = PGconn.connect("/tmp", 5432, "", "", "...", "...", "...")
conn.exec("CREATE TABLE t (b BYTEA NOT NULL)")
conn.exec("CREATE INDEX i_t_b ON t(b)")
(1..ROWS).each { |i|
b = (1..rand(40)+1).collect{"\\\\"+rand(256).to_s(8).rjust(3,"0")}
if i % 1000 == 1; conn.exec("BEGIN"); end
conn.exec("INSERT INTO t VALUES ('#{b}')")
if i % 1000 == 0; conn.exec("COMMIT"); puts "#{i}/1000000..."; end
}

2. output of explain analyze:

=> explain analyze select * from t where b='aa';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.062..0.062 rows=0 loops=1)
Index Cond: (b = 'aa'::bytea)
Total runtime: 0.166 ms
(3 rows)

Time: 19.372 ms

=> explain analyze select * from t where b='aa' or b='ab';
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=6857.151..6857.151 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 6857.345 ms
(3 rows)

Time: 6864.526 ms

=> explain analyze select * from t where b like 'aa%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Index Scan using i_t_v on t (cost=0.00..17.07 rows=6 width=32)
(actual time=0.682..15.763 rows=2 loops=1)
Index Cond: ((b >= 'aa'::bytea) AND (b < 'ab'::bytea))
Filter: (b ~~ 'aa%'::bytea)
Total runtime: 15.935 ms
(4 rows)

Time: 29.432 ms

=> explain analyze select * from t where b like 'aa%' or b like 'ab%';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=636.738..7239.460 rows=3 loops=1)
Filter: ((b ~~ 'aa%'::bytea) OR (b ~~ 'ab%'::bytea))
Total runtime: 7239.758 ms
(3 rows)

Time: 7251.326 ms

=> explain analyze select * from t where b in ('aa','ab');
QUERY PLAN
----------------------------------------------------------------------------------------------------
Seq Scan on t (cost=0.00..25.00 rows=10 width=32) (actual
time=7055.390..7055.390 rows=0 loops=1)
Filter: ((b = 'aa'::bytea) OR (b = 'ab'::bytea))
Total runtime: 7055.574 ms
(3 rows)

Time: 7063.942 ms

--
dave

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sean Chittenden 2004-03-28 19:59:25 Re: Resolution for "ERROR: cannot handle whole-row reference" ?
Previous Message Tom Lane 2004-03-28 16:00:57 Re: Resolution for "ERROR: cannot handle whole-row reference" ?