Re: Implement predicate propagation for non-equivalence clauses

From: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
To: Richard Guo <riguo(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Implement predicate propagation for non-equivalence clauses
Date: 2018-11-22 16:15:42
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

Hi Richard,

I took a look at the v2, here are some comments:

* This feature needs tests, especially for the cases where opfamilies or
data types or collations don't match, and other non-obvious cases where
it shouldn't work.

* Deducing an inequality to a constant is not always helpful. If we know
that a = b and a = const1 and b < const2, we will deduce b = const1 from
the EC, and adding b < const2 doesn't improve selectivity and only makes
the cost estimate worse. One situation where it does make sense is when
we can detect contradictions in these clauses, e.g. if we know that
const1 > const2 and therefore know that the above selection clause is
always false. Looking at the regression test changes, I see that v2
doesn't do that. I think the handling of deduced inequalities shoud be
modeled on the flow of generate_base_implied_equalities ->
process_implied_equality -> distribute_qual_to_rels. This would allow us
to correctly handle a deduced const1 < const2 clause and turn it into a
gating Result qual.

* There are functions named generate_implied_quals, gen_implied_quals
and gen_implied_qual. The names are confusingly similar, we could use
something like generate_implied_quals_for_clause and
generate_one_implied_qual for the latter two.

@@ gen_implied_quals
    else if (clause && IsA(clause, ScalarArrayOpExpr))
* When can the clause be NULL?

@@ gen_implied_quals
    item1 = canonicalize_ec_expression(item1,
                                       exprType((Node *) item1),
    item2 = canonicalize_ec_expression(item2,
                                       exprType((Node *) item2),
* Why do we do this? If the collation or type of the original expression
isn't right and we have to add RelabelType, the resulting expression
won't match the original clause and we won't be able to substitute it
with other equavalence members. So we can just check that the type and
collation match.

* In gen_implied_quals, I'd rename item1 => left, item2 => right, em1 =>
orig_em, em2 => other_em, and same for list cells and types. As it is
now, em1 can actually match both item1 and item2, and em2 is not related
to either of them, so it took me some effort to understand what's going on.

* In gen_implied_qual, why do we search the clause for a matching
subexpression? Reading the thread, I thought that we can only do the
substitution for OpExprs of the same opfamilies as the generating EC.
This code looks like it can do the substitution an at arbitrary depth,
so we might change an argument of some unsuitable function, and the
result will not be correct. What we should probably do is that after we
matched one side of OpExpr to one EC member, we just replace it with
another suitable member and add the resulting clause.

@@ gen_implied_qual
     * If the clause has a mergejoinable operator, set the EquivalenceClass
     * links. Otherwise, a mergejoinable operator with NULL
     * will cause update_mergeclause_eclasses fails at assertion.
    if (new_rinfo->mergeopfamilies)
        initialize_mergeclause_eclasses(root, new_rinfo);
* It's not an equality clause, so it is not going to be mergejoinable
nor hashjoinable and we can skip these checks.

Alexander Kuzmenkov
Postgres Professional:
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Eckhardt 2018-11-22 16:29:50 Re: pg_upgrade supported versions policy
Previous Message Justin Pryzby 2018-11-22 16:09:58 Re: dsa_allocate() faliure