Quick Links

Re: New access method for b-tree.

From:	Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>
To:	Michał Kłeczek <michal(at)kleczek(dot)org>
Cc:	Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>, Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <alexandre(dot)felipe(at)tpro(dot)io>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: New access method for b-tree.
Date:	2026-02-05 06:59:25
Message-ID:	CAE8JnxNM9Bh=LCGOzayewDgX3-kUNXdTDwNSDwsf+t=wKhPiCQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for looking into this.

Now we can execute a, still narrow, family queries!

Maybe it helps to see this as a *social network feeds*. Imagine a social
network, you have a few friends, or follow a few people, and you want to
see their updates ordered by date. For each user we have a different
combination of users that we have to display. But maybe, even having
hundreds of users we will only show the first 10.

There is a low hanging fruit on the skip scan, if we need N rows, and one
group already has M rows we could stop there.
If Nx is the number of friends, and M is the number of posts to show.
This runs with complexity (Nx * M) rows, followed by an (Nx * M) sort,
instead of (Nx * N) followed by an (Nx * N) sort.
Where M = 10 and N is 1000 this is a significant improvement.
But if M ~ N, the merge scan that runs with M + Nx row accesses, (M + Nx)
heap operations.
If everything is on the same page the skip scan would win.

The cost estimation is probably far off.
I am also not considering the filters applied after this operator, and I
don't know if the planner infrastructure is able to adjust it by itself.
This is where I would like reviewer's feedback. I think that the planner
costs are something to be determined experimentally.

Next I will make it slightly more general handling
* More index columns: Index (a, b, s...) could support WHERE a IN (...)
ORDER BY b LIMIT N (ignoring s...)
* Multi-column prefix: WHERE (a, b) IN (...) ORDER BY c
* Non-leading prefix: WHERE b IN (...) AND a = const ORDER BY c on index
(a, b, c)

---
Kind Regards,
Alexandre

On Wed, Feb 4, 2026 at 7:13 AM Michał Kłeczek <michal(at)kleczek(dot)org> wrote:

>
>
> On 3 Feb 2026, at 22:42, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
>
> On Mon, 2 Feb 2026 at 01:54, Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> I'm also wondering how common is the targeted query pattern? How common
> it is to have an IN condition on the leading column in an index, and
> ORDER BY on the second one?
>
>
> I have seen this pattern multiple times. My nickname for it is the
> timeline view. Think of the social media timeline, showing posts from
> all followed accounts in timestamp order, returned in reasonably sized
> batches. The naive SQL query will have to scan all posts from all
> followed accounts and pass them through a top-N sort. When the total
> number of posts is much larger than the batch size this is much slower
> than what is proposed here (assuming I understand it correctly) -
> effectively equivalent to running N index scans through Merge Append.
>
>
> My workarounds I have proposed users have been either to rewrite the
> query as a UNION ALL of a set of single value prefix queries wrapped
> in an order by limit. This gives the exact needed merge append plan
> shape. But repeating the query N times can get unwieldy when the
> number of values grows, so the fallback is:
>
> SELECT * FROM unnest(:friends) id, LATERAL (
> SELECT * FROM posts
> WHERE user_id = id
> ORDER BY tstamp DESC LIMIT 100)
> ORDER BY tstamp DESC LIMIT 100;
>
> The downside of this formulation is that we still have to fetch a
> batch worth of items from scans where we otherwise would have only had
> to look at one index tuple.
>
>
> GIST can be used to handle this kind of queries as it supports multiple
> sort orders.
> The only problem is that GIST does not support ORDER BY column.
> One possible workaround is [1] but as described there it does not play
> well with partitioning.
> I’ve started drafting support for ORDER BY column in GIST - see [2].
> I think it would be easier to implement and maintain than a new IAM (but I
> don’t have enough knowledge and experience to implement it myself)
>
> [1]
> https://www.postgresql.org/message-id/3FA1E0A9-8393-41F6-88BD-62EEEA1EC21F%40kleczek.org
> [2]
> https://www.postgresql.org/message-id/B2AC13F9-6655-4E27-BFD3-068844E5DC91%40kleczek.org
>
> —
> Kind regards,
> Michal
>

Attachment	Content-Type	Size
0002-MERGE-SCAN-Access-method.patch	application/octet-stream	49.1 KB
0003-MERGE-SCAN-Planner-integration.patch	application/octet-stream	27.6 KB
0001-MERGE-SCAN-Test-the-baseline.patch	application/octet-stream	7.5 KB

In response to

Re: New access method for b-tree. at 2026-02-04 07:13:11 from Michał Kłeczek

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	shveta malik	2026-02-05 07:09:16	Re: [PATCH] Support automatic sequence replication
Previous Message	Richard Guo	2026-02-05 06:51:20	Re: Convert NOT IN sublinks to anti-joins when safe