Partial-index predicate proofs got dumber in 9.2

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Partial-index predicate proofs got dumber in 9.2
Date: 2012-11-15 21:23:35
Message-ID: 22779.1353014615@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the optimization regression reported here:
http://archives.postgresql.org/pgsql-performance/2012-11/msg00140.php

It's easy to reproduce the problem in the regression database:

create index ti on tenk1 (fivethous) where fivethous is not null;
explain select * from int4_tbl, tenk1 where f1 = fivethous;

9.1 and earlier will produce a nestloop-with-inner-indexscan, but 9.2
fails to (and ends up with a much-more-expensive hash join) because it
doesn't think the partial index "ti" has been proven usable for the query.
However, because the "=" operator is strict, no row in which fivethous is
null could be relevant to the query result, so we should be able to use
the partial index.

The older code succeeds at this because when it is considering whether
it can use an index for an inner indexscan, it will apply all the join
clauses found by find_clauses_for_join() to the task of proving the index
predicate true. (It's notable that this includes all join clauses
mentioning the target relation, not only those that can be used with the
index's columns; so we'll be able to prove things true even if the index
predicate involves table columns that aren't in the index proper.)

In 9.2, which has been rewritten to generate inner indexscans "bottom up"
as parameterized paths, we simply skip generating parameterized paths for
any index that's not marked predOK, and the predOK test is made using
only the restriction clauses for that relation, not join clauses. Ooops.

We could try to replicate the way the previous code did it, but it'd
probably add significant expense. What I'm thinking at the moment is that
check_partial_indexes() has been missing a bet all along, because there is
no reason for it to confine its attention to restriction clauses. We
should just have it include join clauses for the rel in the restriction
list passed to predicate_implied_by(). That is, a join clause can be
assumed true for the purposes of deciding whether an index is usable
*whether or not we ever actually use that join clause with the index*.

This claim is shaky in the presence of outer joins, though. Consider

select ... from t1 left join t2 on (t1.a = t2.b)

We can use a partial index on t2 that requires t2.b IS NOT NULL, since no
row where b is null will affect the result. A more interesting case is

select ... from t1 left join t2 on (t1.a = t2.b)
where t2.c > 0;

This is actually going to get simplified to a plain inner join, but even
if it did not, we needn't fetch t2 rows where c is null or negative.
That might result in generating some null-extended join rows that
shouldn't be there, if particular values of t1.a no longer have matches,
but the outer WHERE clause would eliminate such bogus rows anyway.

However, that last conclusion depends on the outer WHERE clause being
strict, which doesn't work for instance with

select ... from t1 left join t2 on (t1.a = t2.b)
where t2.c is null;

We could not use an index having the predicate "c is null" to scan t2,
else we might eliminate some rows that should produce matches to t1,
allowing bogus null-extended rows to be produced (which would survive
the outer WHERE).

So I'm thinking that the correct heuristic for check_partial_indexes()
is to use restriction clauses plus any join clauses that satisfy
join_clause_is_movable_to(). That should prevent unsafe use of
upper-level join clauses when there are outer joins below them.

Thoughts?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-11-15 21:28:40 Re: pg_ctl reload -o "...."
Previous Message Peter Geoghegan 2012-11-15 21:13:22 Re: tuplesort memory usage: grow_memtuples