Re: Enhanced containment selectivity function

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Enhanced containment selectivity function
Date: 2005-08-04 15:49:55
Message-ID: 16193.1123170595@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Matteo Beccati <php(at)beccati(dot)com> writes:
> Someone on IRC (AndrewSN if I'm not wrong) pointed out that the
> restriction selectivity function for <@ is contsel, which returns a
> constant value of 0.001. So I started digging in the source code trying
> to understand how the default behaviour could be enhanced, and ended up
> writing a little patch which adds an alternative containment selectivity
> function (called "contstatsel") which is able to deliver better results.

After looking at this a little, it doesn't seem like it has much to do
with the ordinary 2-D notion of containment. In most of the core
geometric types, the "histogram" ordering is based on area, and so
testing the histogram samples against the query doesn't seem like it's
able to give very meaningful containment results --- the items shown
in the histogram could have any locations whatever.

The approach might be sensible for ltree's isparent operator --- I don't
have a very good feeling for the behavior of that operator, but it looks
like it has at least some relationship to the ordering induced by the
ltree < operator.

So my thought is that (assuming Oleg and Teodor agree this is sensible
for ltree) we should put the selectivity function into contrib/ltree,
not directly into the core. It might be best to call it something like
"parentsel", too, to avoid giving the impression that it has something
to do with 2-D containment.

Also, you should think about using the most-common-values list as well
as the histogram. I would guess that many ltree applications would have
enough duplicate entries that the MCV list represents a significant
fraction of the total population. Keep in mind when thinking about this
that the histogram describes the population of data *exclusive of the
MCV entries*.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Woodward 2005-08-04 15:52:35 Re: pg_dump -- data and schema only?
Previous Message Tino Wildenhain 2005-08-04 15:38:47 Re: pg_dump -- data and schema only?

Browse pgsql-patches by date

  From Date Subject
Next Message Matteo Beccati 2005-08-04 17:09:30 Re: Enhanced containment selectivity function
Previous Message Tom Lane 2005-08-04 15:12:58 Re: Enhanced containment selectivity function