Skip site navigation (1) Skip section navigation (2)

Re: join selectivity

From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <strk(at)refractions(dot)net>, <pgsql-hackers(at)postgresql(dot)org>,<postgis-devel(at)postgis(dot)refractions(dot)net>
Subject: Re: join selectivity
Date: 2004-12-16 18:51:14
Message-ID: 9EB50F1A91413F4FA63019487FCD251DAD23@WEBBASEDDC.webbasedltd.local (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi Tom, 

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us] 
> Sent: 16 December 2004 17:56
> To: Mark Cave-Ayland
> Cc: strk(at)refractions(dot)net; pgsql-hackers(at)postgresql(dot)org; 
> postgis-devel(at)postgis(dot)refractions(dot)net
> Subject: Re: [HACKERS] join selectivity
> 
> 
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > OK I think I've misunderstood something more fundamental 
> than that; I 
> > understood from what you said that the RESTRICT clause is used to 
> > evaluate the cost of table1.geom && table2.geom against 
> table2.geom && 
> > table1.geom (i.e. it is used to help decide which one should be seq 
> > scanned and which should be index scanned in a nested loop 
> node). So 
> > is the trick here for a commutative operator to simply 
> return the same 
> > value for both cases, as other factors such as index size costs are 
> > considered elsewhere?
> 
> If the operator is commutative then the result should be too. 
>  Really you should not be thinking about costs at all when 
> coding a selectivity
> estimator: its charter is to estimate how many rows will 
> match the condition, not to estimate costs per se.
> 
> Note however that these aren't really the "same case", as 
> you'd be referencing two different columns with presumably 
> different statistics.

Well at the moment PostGIS has a RESTRICT function that takes an expression
of the form <column> <op> <constant> where column is a column consisting of
geometries and constant is a bounding box. This is based upon histogram
statistics and works well.

The surprise came when writing the JOIN function and finding that the
RESTRICT clause was being called. Now I understand that this is part of the
nested loop and not the JOIN so that helps. But in the case of <column> <op>
<unknown constant>, if we're estimating the number of rows to return then
that becomes harder - I'm thinking pick a rectangle half the area of the
statistical rectangle for the column and return the number of rows within
that area.

> You should probably read the existing selectivity estimators 
> in utils/adt/selfuncs.c.  There's a fair amount of 
> infrastructure code in that file that you could borrow.  
> (It's not currently exported because it tends to change from 
> version to version, but maybe we could think about making 
> some of the routines global.)

OK will try and find some inspiration within.


Many thanks,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT 

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk



In response to

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2004-12-16 18:56:29
Subject: Re: join selectivity
Previous:From: Andrew DunstanDate: 2004-12-16 18:42:49
Subject: Re: integer datetimes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group