Re: POC: Cleaning up orphaned files using undo logs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-08-08 20:27:16
Message-ID: CA+TgmoYT0UzfUa_eiRFyu8SeRza8JaLoNaNiE4w=XzudN1waNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 8, 2019 at 9:31 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I know that Robert is working on a patch that revises the undo request
> layer somewhat, it's possible that this is best discussed afterwards.

Here's what I have at the moment. This is not by any means a complete
replacement for Amit's undo worker machinery, but it is a significant
redesign (and I believe a significant improvement to) the queue
management stuff from Amit's patch. I wrote this pretty quickly, so
while it passes simple testing, it probably has a number of bugs, and
to actually use it, it would need to be integrated with xact.c; right
now it's just a standalone module that doesn't do anything except let
itself be tested.

Some of the ways it is different from Amit's patches include:

* It uses RBTree rather than binaryheap, so when we look ahead, we
look ahead in the right order.

* There's no limit to the lookahead distance; when looking ahead, it
will search the entirety of all 3 RBTrees for an entry from the right
database.

* It doesn't have a separate hash table keyed by XID. I didn't find
that necessary.

* It's better-isolated, as you can see from the fact that I've
included a test module that tests this code without actually ever
putting an UndoRequestManager in shared memory. I would've liked to
expand this test module, but I don't have time to do that today and
felt it better to get this much sent out.

* It has a lot of comments explaining the design and how it's intended
to integrate with the rest of the system.

Broadly, my vision for how this would get used is:

- Create an UndoRecordManager in shared memory.
- Before a transaction first attaches to a permanent or unlogged undo
log, xact.c would call RegisterUndoRequest(); thereafter, xact.c would
store a pointer to the UndoRecord for the lifetime of the toplevel
transaction.
- Immediately after attaching to a permanent or unlogged undo log,
xact.c would call UndoRequestSetLocation.
- xact.c would track the number of bytes of permanent and unlogged
undo records the transaction generates. If the transaction goes onto
abort, it reports these by calling FinalizeUndoRequest.
- If the transaction commits, it doesn't need that information, but
does need to call UnregisterUndoRequest() as a post-commit step in
CommitTransaction().
- In the case of an abort, after calling FinalizeUndoRequest, xact.c
would call PerformUndoInBackground() to find out whether to do undo in
the background or the foreground. If undo is to be done in the
foreground, the backend must go on to call UnregisterUndoRequest() if
undo succeeds, and RescheduleUndoRequest() if it fails.

- In the case of a prepared transaction, a pointer to the UndoRequest
would get stored in the GlobalTransaction (but nothing extra would get
stored in the twophase state file).
- COMMIT PREPARED calls UnregisterUndoRequest().
- ROLLBACK PREPARED calls PerformUndoInBackground; if told to do undo
in the foreground, it must go on to call either
UnregisterUndoRequest() on success or RescheduleUndoRequest() on
failure, just like in the regular abort case.

- After a crash, once recovery is complete but before we open for
connections, or at least before we allow any new undo activity, the
discard worker scans all the logs and makes a bunch of calls to
RecreateUndoRequest(). Then, for each prepared transaction that still
exists, it calls SuspendPreparedUndoRequest() and use the return value
to reset the UndoRequest pointer in the GlobalTransaction. Only once
both of those steps are completed can undo workers be safely started.
- Undo workers call GetNextUndoRequest() to get the next task that
they should perform, and once they do, they "own" the undo request.
When undo succeeds or fails, they must call either
UnregisterUndoRequest() or RescheduleUndoRequest(), as appropriate,
just like for foreground undo. Making sure this is water-tight will
probably require some well-done integration with xact.c, so that an
undo request that we "own" because we got it in a background undo
apply process looks exactly the same as one we "own" because it's our
transaction originally.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Draft-of-new-undo-request-manager.patch application/octet-stream 53.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2019-08-08 20:30:04 Re: SQL/JSON path: collation for comparisons, minor typos in docs
Previous Message Mark G 2019-08-08 20:25:24 Re: Small const correctness patch