Re: Use log_newpage_range in HASH index build

From: lakshmi <lakshmigcdac(at)gmail(dot)com>
To: Kirill Reshke <reshkekirill(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Use log_newpage_range in HASH index build
Date: 2026-01-05 08:50:27
Message-ID: CAEvyyTjGtYRq3Aa0tY6=_QGV6z+6KTQVF=J75Acx2L6i0JOkGw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kirill,
Following up on my earlier note, I implemented the proposed HASH overflow
page reuse enhancement.Recently freed overflow pages are recorded in
_hash_freeovflpage( ),and _hash_addovflpage( ) now prefers reusing those
pages during allocation before falling back to the bitmap scan.

The change is backend-local and allocation-only,with no WAL or on-disk
format changes.I verified correctness using index build/drop and VACUUM
cycles,and confirmed WAL neutrality by comparing WAL generated during HASH
index builds with and without this change (no observable difference beyond
normal noise).

The patch is attached for review.Feedback is welcome.

Best regards,
Lakshmi

On Tue, Dec 23, 2025 at 5:23 PM lakshmi <lakshmigcdac(at)gmail(dot)com> wrote:

> Hi Kirill,
>
> I tested your patch on the current master and confirmed the WAL reduction
> during HASH index build.
>
> While testing, I noticed a possible small follow-up improvement in HASH
> overflow handling. Currently, any free overflow page may be reused, which
> can scatter overflow chains and hurt cache locality. Reusing recently freed
> overflow pages first could help, without changing WAL behavior or on-disk
> format.
>
> I would like to work on this as a follow-up enhancement and would welcome
> any suggestions.
>
> Best regards,
> Lakshmi
>
> On Tue, Dec 23, 2025 at 2:31 PM Kirill Reshke <reshkekirill(at)gmail(dot)com>
> wrote:
>
>> There exists an optimization to index creation process, when we omit
>> to write any WAL
>> for index build. It is currently supported in B Tree, GIN, GiST, spg
>> indexes.
>> It works because we do not need to recover anything if index creation
>> fails, because if was not used by any query. So, the index can be
>> built on-disk, and then, just before making the index alive, we can
>> simply log all pages to WAL.
>>
>> Hash index currently lacks this optimization.
>> PFA implementation.
>>
>> During my testing, I checked the amount of WAL generated by index
>> build before and after patch applied. My script was something like:
>>
>> select pg_current_wal_insert_lsn();
>>
>> create index on t using hash (i);
>>
>> select pg_current_wal_insert_lsn();
>>
>> select pg_lsn_wal_diff(lsn1, lsn2);
>>
>> Resulting numbers depend on index size, but I got 2.5-3.5 times less
>> WAL with this patch and 8 times less WAL with this patch +
>> wal_compression=on.
>> Index creation time, however, did not change much...
>>
>> About implementation:
>> These are many types of record that can be generated during index build.
>> I know for sure these are possible (double-checked using pg_waldump):
>>
>> SPLIT_COMPLETE
>> INSERT
>> SPLIT_ALLOCATE_PAGE
>> SPLIT_PAGE
>> ADD_OVFL_PAGE
>> SQUEEZE_PAGE
>> INIT_META_PAGE
>> INIT_BITMAP_PAGE
>>
>>
>> Looks like SPLIT_COMPLETE and VACUUM_ONE_PAGE are never generated
>> during index build. I'm not sure about MOVE_PAGE_CONTENTS.
>>
>> So, implementation is simply pass isbuild flag everywhere something is
>> wal-logged. Looks like it is less invasive than alternatives.
>>
>> --
>> Best regards,
>> Kirill Reshke
>>
>

Attachment Content-Type Size
0001-hash-reuse-recently-freed-overflow-pages.patch text/x-patch 3.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2026-01-05 08:54:17 Re: Hash-based MCV matching for large IN-lists
Previous Message Andrey Borodin 2026-01-05 08:39:03 Re: GIN pageinspect support for entry tree and posting tree