Re: [PATCH] Equivalence Class Filters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Equivalence Class Filters
Date: 2015-12-05 17:07:41
Message-ID: 30810.1449335261@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> As of today these Equivalence Classes only incorporate expressions which
> use the equality operator, but what if the above query had been written as:

> select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10;

> Should we not be able to assume that t2.id is also <= 10?

This sort of thing has been suggested multiple times before, and I've
rejected it every time on the grounds that it would likely add far more
planner cycles than it'd be worth, eg, time would be spent on searches for
matching subexpressions whether or not anything was learned (and often
nothing would be learned). While I've only read your description of the
patch not the patch itself, the search methodology you propose seems
pretty brute-force and unlikely to solve that issue. It's particularly
important to avoid O(N^2) behaviors when there are N expressions ...

Another issue that would need consideration is how to avoid skewing
planner selectivity estimates with derived clauses that are fully
redundant with other clauses. The existing EC machinery is mostly
able to dodge that problem by generating just a minimal set of equality
clauses from an EC, but I don't see how we'd make that work here.

I'm also wondering why you want to limit it to comparisons to constants;
that seems rather arbitrary.

Lastly, in most cases knowing that t2.id <= 10 is just not worth all
that much; it's certainly far less useful than an equality condition.
It's not difficult to imagine that this would often be a net drag on
runtime performance (never mind planner performance) by doing nothing
except creating additional filter conditions the executor has to check.
Certainly it would be valuable to know this if it let us exclude some
partition of a table, but that's only going to happen in a small minority
of queries.

I'm not necessarily opposed to doing anything in this area, but so far
I've not seen how to do it in a way that is attractive when planner
complexity, performance, and end results are all considered.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-12-05 17:19:53 Re: Size of Path nodes
Previous Message Christian Ullrich 2015-12-05 12:10:52 Re: libxml2 2.9.3 breaks xml test output