Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Alena Rybakina <lena(dot)ribackina(at)yandex(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, benoit <benoit(at)hopsandfork(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan
Date: 2023-07-31 17:04:09
Message-ID: CAH2-Wzkw8pATO6p13pFHzD5jDBGhMhH6dS3O7vGpFTEN1d_MTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 31, 2023 at 12:24 PM Alena Rybakina
<lena(dot)ribackina(at)yandex(dot)ru> wrote:
> I noticed that you are going to add CNF->DNF transformation at the index
> construction stage. If I understand correctly, you will rewrite
> restrictinfo node,
> change boolean "AND" expressions to "OR" expressions, but would it be
> possible to apply such a procedure earlier?

Sort of. I haven't really added any new CNF->DNF transformations. The
code you're talking about is really just checking that every index
path has clauses that we know that nbtree can handle. That's a big,
ugly modularity violation -- many of these details are quite specific
to the nbtree index AM (in theory we could have other index AMs that
are amsearcharray).

At most, v1 of the patch makes greater use of an existing
transformation that takes place in the nbtree index AM, as it
preprocesses scan keys for these types of queries (it's not inventing
new transformations at all). This is a slightly creative
interpretation, too. Tom's commit 9e8da0f7 didn't actually say
anything about CNF/DNF.

> Otherwise I suppose you
> could face the problem of
> incorrect selectivity of the calculation and, consequently, the
> cardinality calculation?

I can't think of any reason why that should happen as a direct result
of what I have done here. Multi-column index paths + multiple SAOP
clauses are not a new thing. The number of rows returned does not
depend on whether we have some columns as filter quals or not.

Of course that doesn't mean that the costing has no problems. The
costing definitely has several problems right now.

It also isn't necessarily okay that it's "just as good as before" if
it turns out that it needs to be better now. But I don't see why it
would be. (Actually, my hope is that selectivity estimation might be
*less* important as a practical matter with the patch.)

> I can't clearly understand at what stage it is clear that the such a
> transformation needs to be applied?

I don't know either.

I think that most of this work needs to take place in the nbtree code,
during preprocessing. But it's not so simple. There is a mutual
dependency between the code that generates index paths in the planner
and nbtree scan key preprocessing. The planner needs to know what
kinds of index paths are possible/safe up-front, so that it can choose
the fastest plan (the fastest that the index AM knows how to execute
correctly). But, there are lots of small annoying nbtree
implementation details that might matter, and can change.

I think we need to have nbtree register a callback, so that the
planner can initialize some preprocessing early. I think that we
require a "two way conversation" between the planner and the index AM.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-07-31 17:17:59 Re: pgsql: Fix search_path to a safe value during maintenance operations.
Previous Message Robert Haas 2023-07-31 16:55:22 Re: Faster "SET search_path"