Re: Fixing matching of boolean index columns to sort ordering

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing matching of boolean index columns to sort ordering
Date: 2017-01-13 07:40:13
Message-ID: CAB7nPqROquR64Ao3CT7Qcac71bizqy6ut0b_C+NTWm28rEE-pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 14, 2016 at 12:18 AM, David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> On Mon, Dec 12, 2016 at 10:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>
>> Every WHERE clause is a
>>
>> boolean expression, so there's no principled reason why such a rule
>> wouldn't result in canonicalizing e.g. "i = 42" into "(i = 42) = true",
>> wreaking havoc on every other optimization we have. Restricting it
>> to only apply to simple boolean columns is no answer because (a) indexes
>> can be on boolean expressions not just simple columns, and (b) part
>> of the point of the canonicalization is to make logically-equivalent
>> expressions look alike, so declining to apply it in some cases would
>> ruin that.
>
> Given our reliance on operators in indexing a canonicalization rule that
> says all boolean yielding expressions must contain an operator would yield
> the desired result. "i = 42" already has an operator so no further
> normalization is necessary to make it conform to such a rule.
>
> The indexing concern may still come into play here, I'm not familiar enough
> with indexes on column lists versus indexes on expressions to know off the
> top of my head. The definition of "looking alike" seems like it would be
> met since all such expression would look alike in having an operator.
>
> That said, not adding "= true" is more visually appealing - though given all
> of the other things we do in ruleutils this seems like a minor addition.

I have spent a couple of hours eye-balling this patch. There are not
that many users with indexes including a boolean column in its
definition... Still as this patch pushes down an index scan and avoids
an order by relying on the index scan to get things in the right order
it looks definitely right to make things better if we can. So that
seems worth it to me, even if that's adding a new extra
boolean-related optimization.

And actually, contrary to what is mentioned upthread, the planner is
not able to avoid a sort phase if other data types are used, say:
=# create table foo (a int, b int);
CREATE TABLE
=# create index on foo (a, b);
CREATE INDEX
=# explain (costs off) select * from foo where a = 1 order by b limit 10;
QUERY PLAN
----------------------------------------------------
Limit
-> Sort
Sort Key: b
-> Bitmap Heap Scan on foo
Recheck Cond: (a = 1)
-> Bitmap Index Scan on foo_a_b_idx
Index Cond: (a = 1)
(7 rows)
In this case, the sorting on column b should not be necessary as it
could be possible to rely on the index scan to get the elements
already sorted. Maybe I am missing something?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2017-01-13 07:59:26 Re: Protect syscache from bloating with negative cache entries
Previous Message Etsuro Fujita 2017-01-13 07:00:19 Re: Push down more full joins in postgres_fdw