Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)

From: Maxim Orlov <orlovmg(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jacob Champion <jchampion(at)timescale(dot)com>, Japin Li <japinli(at)hotmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Date: 2023-03-07 11:38:50
Message-ID: CACG=ezb0_zsf1Ek=RxzpL+J1vs-eYSHat45GMya+LGXpPbr_Ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 17 Jan 2023 at 16:33, Aleksander Alekseev <aleksander(at)timescale(dot)com>
wrote:

> Hi hackers,
>
> Maxim, perhaps you could share with us what your reasoning was here?
>
> I'm really sorry for late response, but better late than never. Yes, we
can not access shared memory without lock.
In this particular case, we use XidGenLock. That is why we use lock
argument to take it is it was not taken previously.
Actually, we may place assertion in this insist.

As for xid compare: we do not compare xids here, we are checking for
wraparound, so, AFAICS, this code is correct.

On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

>
> 1. Use 64 bit page numbers in SLRUs (this patch)
>
> 2. Use the larger segment file names in async.c, to lift the current 8
> GB limit on the max number of pending notifications.
>
> 3. Extend pg_multixact so that pg_multixact/members is addressed by
> 64-bit offsets.
>
> 4. Extend pg_subtrans to 64-bits.
>
> 5. Extend pg_xact to 64-bits.
>
>
> 6. (a bonus thing that I noticed while thinking of pg_xact.) Extend
> pg_twophase.c, to use FullTransactionIds.
>
> Currently, the twophase state files in pg_twophase are named according
> to the 32 bit Xid of the transaction. Let's switch to FullTransactionId
> there.
>
> ...
>
> I propose that we try to finish 1 and 2 for v16. And maybe 6. I think
> that's doable. It doesn't have any great user-visible benefits yet, but
> we need to start somewhere.
>
> - Heikki
>
> Yes, this is a great idea! My only concern here is that we're going in
circles here. You see, patch 1 is what was proposed
in the beginning of this thread. Anyway, I will be happy if we are being
able to push this topic forward.

As for making pg_multixact 64 bit, I spend the last couple of days to make
proper pg_upgrade for pg_multixact's and for pg_xact's
with wraparound and I've understood, that it is not a simple task compare
to pg_xact's. The problem is, we do not have epoch for
multixacts, so we do not have ability to "overcome" wraparound. The
solution may be adding some kind of epoch for multixacts or
make them 64 bit in "main" 64-xid patch, but in perspective of this thread,
in my view, this should be last in line here.

In pg_xact we do not have such a problem, we do have epoch for transacions,
so conversion should be pretty obvious:
0000 -> 000000000000
0001 -> 000000000001
...
0FFE -> 000000000FFE
0FFF -> 000000000FFF
0000 -> 000000010000
0001 -> 000000010001

So, in my view, the plan should be:
1. Use internal 64 bit page numbers in SLRUs without changing segments
naming.
2. Use the larger segment file names in async.c, to lift the current 8 GB
limit on the max number of pending notifications.
3. Extend pg_xact to 64-bits.
4. Extend pg_subtrans to 64-bits.
5. Extend pg_multixact so that pg_multixact/members is addressed by 64-bit
offsets.
6. Extend pg_twophase.c, to use FullTransactionIds. (a bonus thing)

Thoughts?

--
Best regards,
Maxim Orlov.

On Mon, 6 Mar 2023 at 22:48, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:

> On 01/03/2023 12:21, Aleksander Alekseev wrote:
> > Hi,
> >
> >> I'm surprised that these patches extend the page numbering to 64 bits,
> >> but never actually uses the high bits. The XID "epoch" is not used, and
> >> pg_xact still wraps around and the segment names are still reused. I
> >> thought we could stop doing that.
> >
> > To clarify, the idea is to let CLOG grow indefinitely and simply store
> > FullTransactionId -> TransactionStatus (two bits). Correct?
>
> Correct.
>
> > I didn't investigate this in much detail but it may affect quite some
> > amount of code since TransactionIdDidCommit() and
> > TransactionIdDidCommit() currently both deal with TransactionId, not
> > FullTransactionId. IMO, this would be a nice change however, assuming
> > we are ready for it.
>
> Yep, it's a lot of code churn..
>
> > In the previous version of the patch there was an attempt to derive
> > FullTransactionId from TransactionId but it was wrong for the reasons
> > named above in the thread. Thus is was removed and the patch
> > simplified.
>
> Yeah, it's tricky to get it right. Clearly we need to do it at some
> point though.
>
> All in all, this is a big effort. I spent some more time reviewing this
> in the last few days, and thought a lot about what the path forward here
> could be. And I haven't looked at the actual 64-bit XIDs patch set yet,
> just this patch to use 64-bit addressing in SLRUs.
>
> This patch is the first step, but we have a bit of a chicken and egg
> problem, because this patch on its own isn't very interesting, but on
> the other hand, we need it to work on the follow up items. Here's how I
> see the development path for this (and again, this is just for the
> 64-bit SLRUs work, not the bigger 64-bit-XIDs-in-heapam effort):
>
> 1. Use 64 bit page numbers in SLRUs (this patch)
>
> I would like to make one change here: I'd prefer to keep the old 4-digit
> segment names, until we actually start to use the wider address space.
> Let's allow each SLRU to specify how many digits to use in the
> filenames, so that we convert one SLRU at a time.
>
> If we do that, and don't change any of the existing SLRUs to actually
> use the wider space of page and segment numbers yet, this patch becomes
> just refactoring with no on-disk format changes. No pg_upgrade needed.
>
> The next patches will start to make use of the wider address space, one
> SLRU at a time.
>
> 2. Use the larger segment file names in async.c, to lift the current 8
> GB limit on the max number of pending notifications.
>
> No one actually minds the limit, it's quite generous as it is. But there
> is some code and complexity in async.c to avoid the wraparound that
> could be made simpler if we used longer SLRU segment names and avoided
> the wraparound altogether.
>
> I wonder if we should actually add an artificial limit, as a GUC. If
> there are gigabytes of notifications queued up, something's probably
> wrong with the system, and you're not going to be happy if we just
> remove the limit so it can grow to terabytes until you run out of disk
> space.
>
> 3. Extend pg_multixact so that pg_multixact/members is addressed by
> 64-bit offsets.
>
> Currently, multi-XIDs can wrap around, requiring anti-wraparound
> freezing, but independently of that, the pg_multixact/members SLRU can
> also wrap around. We track both, and trigger anti-wraparound if either
> SLRU is about to wrap around. If we redefine MultiXactOffset as a 64-bit
> integer, we can avoid the pg_multixact/members wraparound altogether. A
> downside is that pg_multixact/offsets will take twice as much space, but
> I think that's a good tradeoff. Or perhaps we can play tricks like store
> a single 64-bit offset on each pg_multixact/offsets page, and a 32-bit
> offset from that for each XID, to avoid making it so much larger.
>
> This would reduce the need to do anti-wraparound VACUUMs on systems that
> use multixacts heavily. Needs pg_upgrade support.
>
> 4. Extend pg_subtrans to 64-bits.
>
> This isn't all that interesting because the active region of pg_subtrans
> cannot be wider than 32 bits anyway, because you'll still reach the
> general 32-bit XID wraparound. But it might be less confusing in some
> places.
>
> I actually started to write a patch to do this, to see how complicated
> it is. It quickly proliferates into expanding other XIDs to 64-bits,
> like TransactionXmin, frozenXid calculation in vacuum.c, known-assigned
> XID tracking in procarray.c. etc. It's going to be necessary to convert
> 32-bit XIDs to FullTransactionIds at some boundaries, and I'm not sure
> where exactly that should happen. It's easier to do the conversions
> close to subtrans.c, but then I'm not sure how much it gets us in terms
> of reducing confusion. It's easy to get confused with the epochs during
> conversions, as you noted. On the other hand, if we change much more of
> the backend to use FullTransactionIds, the patch becomes much more
> invasive.
>
> Nice thing with pg_subtrans, though, is that it doesn't require
> pg_upgrade support.
>
> 5. Extend pg_xact to 64-bits.
>
> Similar to pg_subtrans, really, but needs pg_upgrade support.
>
> 6. (a bonus thing that I noticed while thinking of pg_xact.) Extend
> pg_twophase.c, to use FullTransactionIds.
>
> Currently, the twophase state files in pg_twophase are named according
> to the 32 bit Xid of the transaction. Let's switch to FullTransactionId
> there.
>
>
>
> As we start to refactor these things, I also think it would be good to
> have more explicit tracking of the valid range of SLRU pages in each
> SLRU. Take pg_subtrans for example: it's not very clear what pages have
> been initialized, especially during different stages of startup. It
> would be good to have clear start and end page numbers, and throw an
> error if you try to look up anything outside those bounds. Same for all
> other SLRUs.
>
> I propose that we try to finish 1 and 2 for v16. And maybe 6. I think
> that's doable. It doesn't have any great user-visible benefits yet, but
> we need to start somewhere.
>
> - Heikki
>
>

--
Best regards,
Maxim Orlov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-03-07 12:08:26 Re: Add pg_walinspect function with block info columns
Previous Message Önder Kalacı 2023-03-07 10:59:18 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher