Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Maxim Orlov <orlovmg(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Next step towards 64bit XIDs: Switch to FullTransactionId for PGPROC->xid and XLogRecord->xl_xid
Date: 2024-01-02 19:58:39
Message-ID: CA+TgmoYRg1MDF6a-QDrccVO=d2S37bjz54=xSzwen9FYHqw=7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 1, 2024 at 1:15 AM Maxim Orlov <orlovmg(at)gmail(dot)com> wrote:
> Yeah, obviously, this is patch make WAL bigger. I'll try to look into the idea of fxid calculation, as mentioned above.
> It might in part be a "chicken and the egg" situation. It is very hard to split overall 64xid patch into smaller pieces
> with each part been a meaningful and beneficial for current 32xids Postgres.

I agree, but I think that's what we need to try to do. I mean, we
don't want to commit patches that individually make things *worse* on
the theory that when we do that enough times the result will be
overall better. And even if the individual patches seem to be entirely
neutral, that still doesn't offer great hope that the final result
will be an improvement.

Personally, I'm not convinced that "use 64-bit XIDs everywhere" is a
desirable goal. The space consumption issue here is just the tip of
the iceberg, and really exists in many places where we store XIDs. We
don't want to make tuple headers wider any more than we want to bloat
the WAL. We don't want to make the ProcArray bigger, either, not
because of memory consumption but because of the speed of memory
access. What we really want to do is fix the practical problems that
32-bit XIDs cause. AFAIK there are basically three such problems:

1. SLRUs that are indexed by XID can wrap around, or try to, leading
to confusion and bugs in the code and maybe errors when something
fills up.

2. If a table's relfrozenxid become older than ~2B, we have to stop
XID assignment until the problem is cured.

3. If a running transaction's XID age exceeds ~2B, we can't start new
transactions until the problem is cured.

There's already a patch for (1), I believe. I think we can cure that
by indexing SLRUs by 64-bit integers without changing how XIDs are
stored outside of SLRUs. Various people have attempted (2), for
example with an XID-per-page approach, but nothing's been committed
yet. We can't attempt to cure (3) until (1) or (2) are fixed, and I
think that would require using 64-bit XIDs in the ProcArray, but note
that (3) is less serious: (1) and (2) constrain the size of the XID
range that can legally appear on disk, whereas (3) only constraints
the size of the XID range that can legally occur in memory. That makes
a big difference, because reducing the size of the XID range that can
legally appear on disk requires vacuuming all tables with older
relfrozenxids which in the worst case takes time proportional to the
disk utilization of the instance, whereas reducing the size of the XID
range that can legally appear in memory just requires ending
transactions (including possibly prepared transactions) and removing
replication slots, which can be done in O(1) time.

Maybe this analysis I've just given isn't quite right, but my point is
that we should try to think hard about where in the system 32-bit XIDs
suck and for what reason, and use that as a guide to what to change
first.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-02 21:45:44 Re: verify predefined LWLocks have entries in wait_event_names.txt
Previous Message Melih Mutlu 2024-01-02 19:50:17 JIT compilation per plan node