Re: Add 64-bit XIDs into PostgreSQL 15

From: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
To: Evgeny Voropaev <evorop(dot)wiki(at)gmail(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Date: 2025-06-11 13:12:39
Message-ID: 6dff672e-b8b1-4d44-bbe2-cbd2eb96a2bf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

11.06.2025 09:00, Evgeny Voropaev wrote:
> 2) About repairing fragmentation.
>
> The original approach implemented in PG18 assumes that fragmentation
> occurs during every `prune_freeze` operation. It happens because the
> logic of the "redo"-function `heap_xlog_prune_freeze` assumes that
> fragmentation has to be done by `heap_page_prune_execute`.

> Attempting to
> omit fragmentation can result in page inconsistencies on the "redo"-side
> (i.e. on a secondary node, or during the recovery process on primary
> one).

No! Because patch uses flag in WAL record to instruct "redo"-side to omit
fragmentation as well if needed.

> So, implementation of optional repairing of fragmentation
> conflicts with the basic assumption about "necessity of fragmentation".
> In order to prevent inconsistency xid64v62 patch invokes
> `heap_page_prune_and_freeze` with `repairFragmentation` equal to true
> from everywhere in the patch code except from
> `heap_page_prepare_for_xid` which uses `repairFragmentation=false`.
>
> So, why must we perform a `heap_page_prune_execute` without a
> fragmentation during the preparation of a page for xid?
>
> What exactly would break if we did invoke `heap_page_prune_execute` with
> `repairFragmentation=true` during performing of `heap_page_prepare_for_xid`?

Short answer:
- `repairFragmentation` parameter were added after investigating real
production issues with earlier patch versions.

Long answer:

How SELECT works with tuples on a page?
It:
- PINS the page
- takes CONTENT LOCK in SHARED mode
- collects HeapTuples which LOOKS INTO RAW PAGE with t_data.t_choice.t_heap
- RELEASES content lock
- may use those HeapTuples for indefinitely long time relying only on PIN
of the page.

I.e. SELECT relies on the fact, while a page is pinned, tuples on the page
stay at the same positions in memory.

That is why LockBufferForCleanup and ConditionalLockBufferForCleanup checks
there is only single PIN on the page - only backend which will perform
cleanup is allowed to PIN the page.

UPDATE/INSERT/DELETE lock CONTENT LOCK in EXCLUSIVE mode because they may
add new tuples. But they are not allowed to move tuples because concurrent
backends allowed to read tuples from the page in exactly same moment.

--
regards
Yura Sokolov aka funny-falcon

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2025-06-11 13:22:22 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message Dmitry Koval 2025-06-11 13:10:00 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands