From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Vitaly Davydov <v(dot)davydov(at)postgrespro(dot)ru> |
Subject: | Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData |
Date: | 2025-07-06 18:07:11 |
Message-ID: | 20250706180711.16.nmisch@google.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 20, 2025 at 09:02:00AM +0900, Michael Paquier wrote:
> On Fri, Jun 06, 2025 at 03:34:13PM -0700, Noah Misch wrote:
> > Shared memory should (not "must") omit a GXACT whose prepare_start_lsn lies
> > beyond the point recovery has reached. In other words, recovery should keep
> > anachronistic information out of shared memory. For example, one could
> > imagine a straw man implementation in which, before reaching consistency, we
> > load each pg_twophase file that does pass its CRC check. I'd dislike that,
> > because it would facilitate anachronisms involving the
> > GlobalTransactionData.gid field. We could end up with two GXACT having the
> > same gid, one being the present (according to recovery progress) instance of
> > that gid and another being a future instance. The system might not
> > malfunction today, but I'd consider that fragile. Anachronistic entries might
> > cause recovery to need more shared memory than max_prepared_transactions has
> > allocated.
>
> So this comes down to attempting to translate the information from the
> 2PC files to shared memory at a point earlier than what we could.
> Thinking more about this point, there is something that I can see us
> potentially do here: how about tracking the file names in shared
> memory when we go through them in restoreTwoPhaseData()? That
> addresses my point about doing only one scan of pg_twophase/ at the
> beginning of recovery. I have not studied in details how doable it
> is,
A challenge is that the number of files in pg_twophase can exceed
max_prepared_transactions if we've not yet reached consistency. That happens
with a sequence like this:
- exactly max_prepared_transactions files are in pg_twophase
- backup process copies pg_twophase/$file1, then pauses before the next readdir
- some backend deletes pg_twophase/$file1
- checkpointer creates pg_twophase/$fileN
- backup process wakes up and copies remaining pg_twophase files
That's the only problem coming to mind. If that problem doesn't cancel out
the benefits of scanning pg_twophase early, you may as well do it.
> By the way, what do you think we should do with 0001 at this stage
> (your refactoring work with some changes I've sprinkled)? I was
> looking at it this morning with fresh eyes and my opinion about it is
> that it improves the situation in all these 2PC callers on HEAD now
> that we need to deal with fxids for the file names, so that's an
> independent piece. Perhaps this should be applied once v19 opens for
> business while we sort out the rest that 0002 was trying to cover?
That looks harmless.
> Note: this introduced two calls to FullTransactionIdFromAllowableAt()
> in StandbyTransactionIdIsPrepared() and PrepareRedoAdd(), which worried
> me a bit on second look and gave me a pause:
> - The call in StandbyTransactionIdIsPrepared() should use
> AdjustToFullTransactionId(), ReadTwoPhaseFile() calls
> TwoPhaseFilePath(), which uses AdjustToFullTransactionId() on HEAD.
> - The call in PrepareRedoAdd() slightly worried me, but I think that
> we're OK with using FullTransactionIdFromAllowableAt(), as we are in
> the case of a 2PC transaction recovered from WAL, with the XID coming
> from the file's header. I think that we'd better put an
> Assert(InRecovery) in this path, just for safety.
>
> What do you think?
It's an appropriate level of risk-taking.
From | Date | Subject | |
---|---|---|---|
Next Message | Hannu Krosing | 2025-07-06 18:22:25 | Re: What is a typical precision of gettimeofday()? |
Previous Message | Noah Misch | 2025-07-06 16:13:19 | Re: [PATCH] Add support for displaying database service in psql prompt |