Re: POC, WIP: OR-clause support for indexes

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: Marcos Pegoraro <marcos(at)f10(dot)com(dot)br>, Alena Rybakina <a(dot)rybakina(at)postgrespro(dot)ru>, Andrey Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru
Subject: Re: POC, WIP: OR-clause support for indexes
Date: 2023-06-26 03:18:37
Message-ID: CAH2-WzmD5u5kCZG0qMtVySz8VB1_drOiX=j0buDufK-EJc3YkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 25, 2023 at 6:48 PM Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru> wrote:
> I finished writing the code patch for transformation "Or" expressions to "Any" expressions.

This seems interesting to me. I'm currently working on improving
nbtree's "native execution of ScalarArrayOpExpr quals" (see commit
9e8da0f7 for background information). That is relevant to what you're
trying to do here.

Right now nbtree's handling of ScalarArrayOpExpr is rather
inefficient. The executor does pass the index scan an array of
constants, so the whole structure already allows the nbtree code to
execute the ScalarArrayOpExpr in whatever way would be most efficient.
There is only one problem: it doesn't really try to do so. It more or
less just breaks down the large ScalarArrayOpExpr into "mini" queries
-- one per constant. Internally, query execution isn't significantly
different to executing many of these "mini" queries independently. We
just sort and deduplicate the arrays. We don't intelligently decide
which pages dynamically. This is related to skip scan.

Attached is an example query that shows the problem. Right now the
query needs to access a buffer containing an index page a total of 24
times. It's actually accessing the same 2 pages 12 times. My draft
patch only requires 2 buffer accesses -- because it "coalesces the
array constants together" dynamically at run time. That is a little
extreme, but it's certainly possible.

BTW, this project is related to skip scan. It's part of the same
family of techniques -- MDAM techniques. (I suppose that that's
already true for ScalarArrayOpExpr execution by nbtree, but without
dynamic behavior it's not nearly as valuable as it could be.)

If executing ScalarArrayOpExprs was less inefficient in these cases
then the planner could be a lot more aggressive about using them.
Seems like these executor improvements might go well together with
what you're doing in the planner. Note that I have to "set
random_page_cost=0.1" to get the planner to use all of the quals from
the query as index quals. It thinks (correctly) that the query plan is
very inefficient. That happens to match reality right now, but the
underlying reality could change significantly. Something to think
about.

--
Peter Geoghegan

Attachment Content-Type Size
saop_patch_test.sql application/octet-stream 6.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirk Wolak 2023-06-26 03:26:38 Re: RFC: Adding \history [options] [filename] to psql (Snippets and Shared Queries)
Previous Message David Rowley 2023-06-26 02:38:43 Re: Speeding Up Bitmapset