Re: Multicolumn index scan efficiency

From: Vitalii Tymchyshyn <vit(at)tym(dot)im>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Multicolumn index scan efficiency
Date: 2025-11-10 05:12:09
Message-ID: CABWW-d154jXcE143KKb8UYMnjKviG=fsJPmbYxcERyr2ocOKpg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thank you so much for both clarifying and fixing it!
In our case (FYI, this is from http://github.com/cdapio/cdap) a lot of
users have just a single namespace, so it effectively means scanning till
the end of the index.
We'll fix
https://github.com/cdapio/cdap/blob/develop/cdap-data-fabric/src/main/java/io/cdap/cdap/spi/data/sql/PostgreSqlStructuredTable.java
to detect equality scan prefixes and make corresponding SQL. That would fix
it for all postgres versions.

Best regards, Vitalii Tymchyshyn

нд, 9 лист. 2025 р. о 20:21 Peter Geoghegan <pg(at)bowt(dot)ie> пише:

> On Sun, Nov 9, 2025 at 9:44 PM Vitalii Tymchyshyn <vit(at)tym(dot)im> wrote:
> > I am wondering about 2 things:
> > 1) Does anyone know which specific change / version made it fast?
> > 2) What was the proper way to do a range index scan like WHERE (a,b,c)
> between (x1,y1,z1) and (x2,y2,z2) before the improvement.
> > Note that my tests can mostly be rewritten as equality at least for some
> columns (and this is what we'll do), but sometimes we do need a range scan
> like above, so understanding it would be important. Also I am curious :).
>
> This improvement you're seeing here is down to work in commit
> bd3f59fd. The short version is that the way we used to decide when a
> condition like "WHERE (a,b,c) <= (x2,y2,z2)" was needlessly
> conservative. If there were many "a" values equal to x2, we'd have to
> scan the index until we got to the next distinct/non-equal "a" value
> -- without realizing that we're already past the point where there
> cannot possibly be any more matches.
>
> See the discussion on this thread which complained about the problem,
> particularly my response to the complaint:
>
>
> https://www.postgresql.org/message-id/flat/CAH2-WzmLREy6r68A6SEHXnstg01kNs1HiQtOvSO5cTvWuaducw%40mail.gmail.com#62e393ac8bbf06f0f73598ba2ceeab69
>
> --
> Peter Geoghegan
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2025-11-10 06:40:30 Re: proposal: schema variables
Previous Message Peter Geoghegan 2025-11-10 04:20:39 Re: Multicolumn index scan efficiency