intarray planning/exeuction problem with multiple operators

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: oleg(at)sai(dot)msu(dot)su, teodor(at)sigaev(dot)ru, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: intarray planning/exeuction problem with multiple operators
Date: 2015-07-13 06:12:41
Message-ID: CAMkU=1yx952u=tc8o1Ed1LZgLceopJ3MR5n0OAP84tPWn_jNtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've found an interesting performance problem in the intarray extension
module. It doesn't seem to be version dependent, I've verified it in 9.4.4
and 9.6devel.

If I do a query that involves both an = op and a @@ op ANDed together, it
gives a high cost estimate, which is justified by the slow runtime. If I
omit the @@ it gives a low estimate, also justified. If I trick it into
thinking it cannot use the index to satisfy the @@ op, then it gives a low
estimate and low runtime, applying the @@ in the filter step and only the
fast = in the bitmap index scan.

Now it could use the index for the fast = operation and apply the @@ in the
recheck, rather than the filter. But I guess it doesn't think of that,
despite knowing that applying the @@ in index operation is slow.

So it seems to be missing a trick someplace, but I don't if it reasonable
to expect it to find that trick, or how easy/hard it would be to implement.

The proposed selectivity estimate improvement patch for @@ does not fix
this (and evaluating that patch was how I found this issue.)

Set up:

create table foobar as
select (
select
array_agg(floor(sqrt(random()*10000000+g.y/1000000+f.x/10000000))::int)
from generate_series(1,10) g(y)
) foo
from generate_series(1,10000000) f(x);
create index on foobar using gin(foo gin__int_ops);
analyze;

You can knock an order of magnitude off from the table size and should
still be able to see the problem.

example:

explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo @@
'!1'::query_int;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on foobar (cost=0.00..263637.00 rows=1 width=61) (actual
time=9717.957..9717.957 rows=0 loops=1)
Filter: ((foo @@ '!1'::query_int) AND (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[]))
Rows Removed by Filter: 10000000
Buffers: shared hit=64 read=113573
I/O Timings: read=361.402
Planning time: 0.101 ms
Execution time: 9717.991 ms
(7 rows)

explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' ;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foobar (cost=112.01..116.02 rows=1 width=61) (actual
time=0.027..0.027 rows=0 loops=1)
Recheck Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
-> Bitmap Index Scan on foobar_foo_idx (cost=0.00..112.01 rows=1
width=0) (actual time=0.023..0.023 rows=0 loops=1)
Index Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
Planning time: 0.097 ms
Execution time: 0.061 ms

If I trick it into thinking the @@ operator cannot be used in th eindex,
then it gets really fast again:

explain(analyze, buffers) select * from foobar where foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}' and foo||'{}'
@@ '!1'::query_int;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on foobar (cost=112.01..116.03 rows=1 width=61) (actual
time=0.082..0.082 rows=0 loops=1)
Recheck Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Filter: ((foo || '{}'::integer[]) @@ '!1'::query_int)
Buffers: shared hit=21
-> Bitmap Index Scan on foobar_foo_idx (cost=0.00..112.01 rows=1
width=0) (actual time=0.080..0.080 rows=0 loops=1)
Index Cond: (foo =
'{22046,26347,6816,21401,18802,31318,30628,8027,22217,20984}'::integer[])
Buffers: shared hit=21
Planning time: 0.139 ms
Execution time: 0.129 ms

Cheers,

Jeff

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2015-07-13 07:16:37 Re: optimizing vacuum truncation scans
Previous Message Michael Paquier 2015-07-13 04:25:44 Re: A little RLS oversight?