Re: Add 64-bit XIDs into PostgreSQL 15

From: Evgeny Voropaev <evgeny(dot)voropaev(at)tantorlabs(dot)com>
To: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Maxim Orlov <orlovmg(at)gmail(dot)com>
Subject: Re: Add 64-bit XIDs into PostgreSQL 15
Date: 2025-07-07 08:17:13
Message-ID: 4eb56320-744e-49ba-b766-702bc2fb61a8@tantorlabs.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, hackers!

Unfortunately, the problem of inconsistency while using prune_frezze
with repairFragmentation=false does not only pertain to the content of
dead and unused tuples, but it also can bring about inconsistency of
locations of alive tuples.

This case appears in the logic of heap_insert. See the attached figure.
When heap_insert determines that a new tuple is the only one on a page,
it sets the XLOG_HEAP_INIT_PAGE and, as a result, “redo”-side
initializes the new page and inserts the new tuple on this new page
instead of inserting the new tuple on the existing page.

So, we have the next situation in the xid64 patch.

Do-side:
1. Having page ABC with several tuples.
2. Starting to perform insertion of new tuple
2.1. In the case of an inappropriate xid_base, trying to fit base
2.1.1 Freezing and pruning tuples without further repairing
fragmentation.
2.1.2 All tuples have been pruned (no alive tuples on the page
since this moment)
3. Inserting a new tuple and setting XLOG_HEAP_INIT_PAGE, assuming that
the only tuple located at the bottom of the page (assuming that
fragmentation has been performed).

Result: We have the ABC page with the new tuple inserted somewhere in
the MIDDLE of the page and surrounded with garbage from dead and unused
tuples. At the same time we have an xlog record bringing the
XLOG_HEAP_INIT_PAGE bit.

Redo-side
1. Observing XLOG_HEAP_INIT_PAGE
2. Creating a new page and inserting the new tuple into the first
position of the page.

Result: We have the ABC page with the new tuple inserted at the BOTTOM
of the page.

This example of inconsistency is not about the content of the tuple but
about tuple’s locations on the page. And tuple offsets are not subject
to masking by the standard masking procedure.

The possible fix can be like one in attachment. But what I’m trying to
suggest is adhering to the original realization of PG, performing
prune_freeze only under a buffer cleanup lock, and fully excluding
repairFragmentation=false as a vice!

Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.

Attachment Content-Type Size
0001-fixing-page-inconsistency-with-a-single-tuple-on-the-page.patch text/x-patch 1.8 KB
single-tuple-page-inconsistency.png image/png 1.2 MB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nazir Bilal Yavuz 2025-07-07 08:45:06 Re: meson vs. llvm bitcode files
Previous Message Dean Rasheed 2025-07-07 07:53:46 Re: array_random