Re: POC, WIP: OR-clause support for indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: "Finnerty, Jim" <jfinnert(at)amazon(dot)com>, Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-08-02 18:58:11
Message-ID: CAH2-WzmG-+vvN6YEh4J_j3-Q1Yn=HRrjDrz2HDhttPcLVqN0rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 2, 2023 at 8:58 AM Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> wrote:
> No, I haven't thought about it yet. I studied the example and it would
> really be nice to add optimization here. I didn't notice any problems
> with its implementation. I also have an obvious example with the "or"
> operator, for example
> , select * from multi_test, where (a, b ) = ( 1, 1 ) or (a, b ) = ( 2, 1
> ) ...;
>
> Although I think such a case will be used less often.

Right. As I said, I don't particularly care about the row constructor
syntax -- it's not essential.

In my experience patches like this one that ultimately don't succeed
usually *don't* have specific problems that cannot be fixed. The real
problem tends to be ambiguity about the general high level design. So
more than anything else, ambiguity is the thing that you need to
minimize to be successful here. This is the #1 practical problem, by
far. This may be the only thing about your patch that I feel 100% sure
of.

In my experience it can actually be easier to expand the scope of a
project, and to come up with a more general solution:

https://en.wikipedia.org/wiki/Inventor%27s_paradox

I'm not trying to make your work more difficult by expanding its
scope. I'm actually trying to make your work *easier* by expanding its
scope. I don't claim to know what the specific scope of your patch
should be at all. Just that it might be possible to get a much clearer
picture of what the ideal scope really is by *trying* to generalize it
further -- that understanding is what we lack right now. Even if this
exercise fails in some way, it won't really have been a failure. The
reasons why it fails will still be interesting and practically
relevant.

> It seems to me the most difficult thing is to notice problematic cases
> where the transformations are incorrect, but I think it can be implemented.

Right. To be clear, I am sure that it won't be practical to come up
with a 100% theoretically pure approach. If for no other reason than
this: normalizing to CNF in all cases will run into problems with very
complex predicates. It might even be computationally intractable
(could just be very slow). So there is a clear need to keep
theoretical purity in balance with practical considerations. There is
a need for some kind of negotiation between those two things. Probably
some set of heuristics will ultimately be required to keep costs and
benefits in balance.

> I agree with your position, but I still don't understand how to consider
> transformations to generalized cases without relying on special cases.

Me neither. I wish I could say something a bit less vague here.

I don't expect you to determine what set of heuristics will ultimately
be required to determine when and how to perform CNF conversions, in
the general case. But having at least some vague idea seems like it
might increase confidence in your design.

> As I understand it, you assume that it is possible to apply
> transformations at the index creation stage, but there I came across the
> selectivity overestimation problem.
>
> I still haven't found a solution for this problem.

Do you think that this problem is just an accidental side-effect? It
isn't necessarily the responsibility of your patch to fix such things.
If it's even possible for selectivity estimates to change, then it's
already certain that sometimes they'll be worse than before -- if only
because of chance interactions. The optimizer is often right for the
wrong reasons, and wrong for the right reasons -- we cannot really
expect any patch to completely avoid problems like that.

> To be honest, I think that in your examples I understand better what you
> mean by normalization to the conjunctive norm, because I only had a
> theoretical idea from the logic course.
>
> Hence, yes, normalization/security checks - now I understand why they
> are necessary.

As I explained to Jim, I am trying to put things in this area on a
more rigorous footing. For example, I have said that the way that the
nbtree code executes SAOP quals is equivalent to DNF. That is
basically true, but it's also my own slightly optimistic
interpretation of history and of the design. That's a good start, but
it's not enough on its own.

My interpretation might still be wrong in some subtle way, that I have
yet to discover. That's really what I'm concerned about with your
patch, too. I'm currently trying to solve a problem that I don't yet
fully understand, so for me "getting a properly working flow of
information" seems like a good practical exercise. I'm trying to
generalize the design of my own patch as far as I can, to see what
breaks, and why it breaks. My intuition is that this will help me with
my own patch by forcing me to gain a truly rigorous understanding of
the problem.

My suggestion about generalizing your approach to cover RowCompareExpr
cases is what I would do, if I were you, and this was my patch. That's
almost exactly what I'm doing with my own patch already, in fact.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-08-02 19:47:51 Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Previous Message Bharath Rupireddy 2023-08-02 18:57:00 Re: Synchronizing slots from primary to standby