Re: PostgreSQL 18 GA press release draft

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: PostgreSQL 18 GA press release draft
Date: 2025-09-10 19:59:24
Message-ID: CAH2-WzkKgCEnXyaaXPB-a1JrdC9uRPvSEiAhEvb6M+vMTSTdaQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 9, 2025 at 11:13 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
> Attached is a draft of the PostgreSQL 18 GA press release.

I have some feedback on this sentence:

"It can also automatically optimize queries using `OR` or `IN (VALUES
...)` in `WHERE` clauses for faster execution".

This is factually correct, but I think that it gives too much
importance to the `IN (VALUES ...)` transformation added by commit
c0962a11. IMV we shouldn't mention anything about transformations that
affect queries that use IN(), since it only applies to `IN (VALUES
...)` -- which is a rather limited special case. Especially because
this IN(VALUES()) case is limited to transforming queries that only
have true constants in the VALUES() clause -- it cannot work with
parameters at all.

I say this in part because I've noticed that existing press reports
about this functionality (which were based on the beta1 announcement)
say that it affects IN() queries in general, which isn't true. Again,
I know that you haven't made that same mistake here -- but a lot of
people will read `IN (VALUES ...)` as "any and all IN() lists".
They'll tend to interpret "VALUES" as "some values that appear in an
IN()", and not "a VALUES() clause that appears in an IN()".

The work from commits d4378c00 and ae456916 is truly important, and
definitely merits prominent mention in the press release. That'll
transform a query written as "SELECT * FROM tenk1 WHERE tenthous = 1
OR tenthous = 3 OR tenthous = 42 OR tenthous = 0" into a
representation that was previous only used when the query was written
"SELECT * FROM tenk1 WHERE tenthous IN (1,3,42,0)" (namely, it
transforms the original such that we can use the ScalarArrayOpExpr
representation).

This transformation is particularly useful in cases where it'll allow
us to get an index-only scan plan instead of a BitmapOr plan (with one
bitmap index scan child node for each of the 3 "tenthous" values from
the query), which is the only plan we could ever get on earlier
releases. The transformation process for these OR cases *can* work
with dynamic parameters (unlike the VALUES() stuff), and so can even
be used on the inner side of a join (see also commit 627d6341, which
dealt with making it possible to use OR transformation with joins).

Putting it all together, I suggest the following alternative:

"It can also automatically transform queries with `OR` constructs in
their `WHERE` clause into a logically equivalent IN() representation
that can be pushed down to index scan nodes, leading to significantly
faster execution".

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2025-09-10 20:01:25 Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Previous Message Natalya Aksman 2025-09-10 19:41:30 Re: Adding skip scan (including MDAM style range skip scan) to nbtree