Re: [WiP] B-tree page merge during vacuum to reduce index bloat

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, boekewurm+postgres(at)gmail(dot)com
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Kirk Wolak <wolakk(at)gmail(dot)com>, Nikolay Samokhvalov <nik(at)postgres(dot)ai>
Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat
Date: 2025-08-31 12:15:32
Message-ID: CCD000DB-67CB-4D64-A912-B7514D546058@yandex-team.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 29 Aug 2025, at 13:39, Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
>
> I think to establish baseline for locking correctness we are going to start from writing index scan tests, that fail with proposed merge patch and pass on current HEAD. I want to observe that forward scan is showing duplicates and backward scan misses tuples.

Well, that was unexpectedly easy. See patch 0001. It brings a test where we create sparse tree, and injection point that will wait on a scan stepping into some middle leaf page.
Then the test invokes vacuum. There are ~35 leaf pages, most of them will be merged into just a few pages.
As expected, both scans produce incorrect results.
t/008_btree_merge_scan_correctness.pl .. 1/?
# Failed test 'Forward scan returns correct count'
# at t/008_btree_merge_scan_correctness.pl line 132.
# got: '364'
# expected: '250'

# Failed test 'Backward scan returns correct count'
# at t/008_btree_merge_scan_correctness.pl line 133.
# got: '142'
# expected: '250'
# Looks like you failed 2 tests of 2.

> From that we will try to design locking that does not affect performance significantly, but allows to merge pages. Perhaps, we can design a way to switch new index scans to "safe mode" during index vacuum and waiting for existing scans to complete.

What if we just abort a scan, that stepped on the page where tuples were moved out?
I've prototype this approach, please see patch 0002. Maybe in future we will improve locking protocol if we will observe high error rates.
Unfortunately, this approach leads to default mergefactor 0 instead of 5%.

What do you think? Should we add this to CF or the idea is too wild for a review?

Best regards, Andrey Borodin.

Attachment Content-Type Size
v2-0001-btree-Add-page-merge-during-vacuum-to-reduce-inde.patch application/octet-stream 30.9 KB
v2-0002-btree-Add-scan-abort-mechanism-for-page-merge-wit.patch application/octet-stream 9.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florents Tselai 2025-08-31 12:29:51 Add xicorr(X, Y): support for the xi (ξ) correlation coefficient by Chatterjee
Previous Message Alvaro Herrera 2025-08-31 12:09:29 Re: Adding REPACK [concurrently]