Fixing matching of boolean index columns to sort ordering

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Fixing matching of boolean index columns to sort ordering
Date: 2016-12-13 05:08:04
Message-ID: 1788.1481605684@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached patch addresses the complaint raised in
https://www.postgresql.org/message-id/CAHt_Luuao4gd6De61GryK=2ff-MTgHzjqffdjz02uSdVqYmKKQ@mail.gmail.com

namely, that if you have an index on, say, integer columns i and j,
then the planner will figure out that it can use an indexscan with
no additional sort for a query like
select * from tab where i = 42 order by j;
but the same sort of thing does not work when the first column is
a boolean. You would think that this query is entirely isomorphic
to the one above:
select * from tab where b = true order by j;
but it isn't, because in expression preprocessing we simplify that to
select * from tab where b order by j;
Now, there's no equality condition so no EquivalenceClass is created
containing "b", and it's the presence of the EquivalenceClass that
drives the code that recognizes that the first index column can be
ignored while deciding what sort order the index produces.

The patch fixes that through the expedient of matching boolean index
columns to the restriction clauses for "tab", and when it finds a
match, acting as though we'd found a match to an EquivalenceClass
containing a constant. This is pretty ugly, but no more so than
several other existing special cases for boolean index columns.

Those special cases would largely go away if we were to canonicalize
"WHERE b" into "WHERE b = true" rather than the reverse, so you might
reasonably ask why we don't do that. I've asked myself that every time
I had to add another one of these special cases :-(, but the answer is
the same every time: where would you stop? Every WHERE clause is a
boolean expression, so there's no principled reason why such a rule
wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
wreaking havoc on every other optimization we have. Restricting it
to only apply to simple boolean columns is no answer because (a) indexes
can be on boolean expressions not just simple columns, and (b) part
of the point of the canonicalization is to make logically-equivalent
expressions look alike, so declining to apply it in some cases would
ruin that.

So, for better or worse, our approach is to simplify out "= true"
and then do whatever pushups we have to do at lower levels to keep
useful cases working nicely. This is another such pushup.

I'll add this to the next commitfest --- it could use some review
to see if I missed anything.

regards, tom lane

Attachment Content-Type Size
improve-boolean-index-matching.patch text/x-diff 7.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-12-13 05:44:07 Re: Password identifiers, protocol aging and SCRAM protocol
Previous Message Fujii Masao 2016-12-13 03:49:12 Re: Re: [sqlsmith] FailedAssertion("!(XLogCtl->Insert.exclusiveBackup)", File: "xlog.c", Line: 10200)