Re: Generalizing range-constraint detection in clauselist_selectivity

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generalizing range-constraint detection in clauselist_selectivity
Date: 2012-09-29 00:13:44
Message-ID: 18424.1348877624@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I'm thinking that this is overly restrictive, and we could usefully
>> suppose that "var >= anything" and "var <= anything" should be treated
>> as a range constraint pair if the vars match and there are no volatile
>> functions in the expressions. We are only trying to get a selectivity
>> estimate here, so rigorous correctness is not required. However, I'm
>> a little worried that I might be overlooking cases where this would be
>> unduly optimistic. Does anyone see a situation where such a pair of
>> clauses *shouldn't* be thought to be a range constraint on the var?
>> For instance, should we still restrict the "var" side to be an
>> expression in columns of only one relation?

> Hmmm. I don't see why we have to restrict them, at least in theory.
> If more than one relation is involved in an expression for "var", then
> doesn't the join between the other relations have to be evaluated prior
> to evaluating the join conditions on the range relation? i.e. it seems
> to me that for relations a,b,c:

> where
> ( a.1 + b.1 ) <= c.1 and ( a.2 + b.2 ) >= c.1

> ... that we're already forced to join a and b before we can meaningfully
> evaluate the join condition on c, no? If not, then we do have to
> restrict, but it seems to me that we are.

Well, one point that I'm not too sure about the implications of is that
in practice, clauselist_selectivity is not called on random collections
of clauses, but only clauses that are all going to be evaluated at the
"same place", ie a particular scan or join. So that's already going to
limit the combinations of clauses that it can be pointed at. An example
of why this might be an issue is

a.x >= b.y AND a.x <= constant

If we change things as I'm thinking, these two clauses would be seen as
a range pair, but only when they appear in the same clause list. And
most of the time they wouldn't --- a.x <= constant would drop down to
the restriction clause list for "a", but the first clause would be kept
in the a+b join clause list. This means the size of the a+b join
relation would be estimated without recognizing the range relationship.
But then, if we considered a parameterized indexscan on a.x, it would
have both clauses in its indexqual list, so we'd use the range
interpretation in costing that indexscan, which would likely give that
particular plan an "unfair" advantage. Maybe that's just fine, or maybe
it isn't. I'm not sure.

We could probably eliminate that inconsistency by insisting that two
clauses can only be matched for this purpose when they reference the
same set of rels overall, but that doesn't feel right --- it certainly
seems like the example above ought to be thought of as a range
restriction if possible.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-09-29 00:14:42 Re: embedded list v3
Previous Message Tom Lane 2012-09-28 23:54:37 Re: embedded list v2