Enhanced containment selectivity function

From: Matteo Beccati <php(at)beccati(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Enhanced containment selectivity function
Date: 2005-08-04 14:36:54
Message-ID: 42F22806.20503@beccati.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

I've recently had problems with slow queries caused by the selectivity
of the <@ ltree operator, as you may see in my post here:

http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php

Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
restriction selectivity function for <@ is contsel, which returns a
constant value of 0.001. So I started digging in the source code trying
to understand how the default behaviour could be enhanced, and ended up
writing a little patch which adds an alternative containment selectivity
function (called "contstatsel") which is able to deliver better results.

This first version is based on the eqsel function and uses only
histogram values to calculate the selectivity and uses the 0.001
constant as a fallback.

This also made me think: is there a reason why geometric selectivity
functions return constant values rather than checking statistics for a
better result?

Attached you will find a patch suitable for current CVS HEAD. My C
skills are a bit rusty and my knowledge of pg internals are very poor,
so I'm sure it could be improved and modified to better fit the pg
coding standards.

Here are the results on a slow query:

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..553.02 rows=8 width=364) (actual
time=2.423..19787.259 rows=6785 loops=1)
-> Index Scan using gw_users_gisttree_key on gw_users
(cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434
rows=4696 loops=1)
Index Cond: (tree <@ '1041'::ltree)
-> Index Scan using gw_batches_t_stamp_u_id_key on gw_batches
(cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1
loops=4696)
Index Cond: ((gw_batches.t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone) AND ("outer".u_id =
gw_batches.u_id))
Total runtime: 19805.447 ms
(6 rows)

test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING
(u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01';

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=245.26..1151.80 rows=7671 width=364) (actual
time=69.562..176.966 rows=6785 loops=1)
Hash Cond: ("outer".u_id = "inner".u_id)
-> Bitmap Heap Scan on gw_batches (cost=57.74..764.39 rows=8212
width=212) (actual time=8.330..39.542 rows=7819 loops=1)
Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp
with time zone)
-> Bitmap Index Scan on gw_batches_t_stamp_u_id_key
(cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819
loops=1)
Index Cond: (t_stamp > '2005-07-01
00:00:00+02'::timestamp with time zone)
-> Hash (cost=175.79..175.79 rows=4692 width=156) (actual
time=61.046..61.046 rows=4696 loops=1)
-> Seq Scan on gw_users (cost=0.00..175.79 rows=4692
width=156) (actual time=0.083..34.200 rows=4696 loops=1)
Filter: (tree <<@ '1041'::ltree)
Total runtime: 194.621 ms
(10 rows)

The second query uses a custom <<@ operator I added to test the
alternative selectivity function:

CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS
double precision AS 'contstatsel' LANGUAGE internal;
CREATE OPERATOR <<@ (
LEFTARG = ltree,
LEFTARG = ltree,
PROCEDURE = ltree_risparent,
COMMUTATOR = '@>',
RESTRICT = contstatsel,
JOIN = contjoinsel
);

Of course any comments/feedback are welcome.

Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/

Attachment Content-Type Size
contstatsel.diff text/plain 6.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Woodward 2005-08-04 14:43:49 Re: Solving the OID-collision problem
Previous Message Mark Woodward 2005-08-04 14:26:38 pg_dump -- data and schema only?

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2005-08-04 14:46:24 Re: prevent encoding conversion recursive error
Previous Message Qingqing Zhou 2005-08-04 08:53:51 prevent encoding conversion recursive error