Quick Links

Re: POC: Cleaning up orphaned files using undo logs

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: POC: Cleaning up orphaned files using undo logs
Date:	2019-07-15 20:39:19
Message-ID:	CA+Tgmob1Oby7Wc5ryB_VBccU9N+uSKjXXocgT9dY_edfxqSA8Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jul 1, 2019 at 3:54 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> 2. Introduced a new RMGR callback rm_undo_status. It is used to
> decide when record sets in the UNDO_SHARED category should be
> discarded (instead of the usual single xid-based rules). The possible
> answers are "discard me now!", "ask me again when a given XID is all
> visible", and "ask me again when a given XID is no longer running".

From the minor nitpicking department, the patches from this stack that
are updating rmgrlist.h are consistently failing to update the comment
line preceding the list of PG_RMGR() lines. This looks to be patches
0014 and 0015 in this stack; 0015 seems to need to be squashed into
0014.

Reviewing Amit's 0016:

performUndoActions appears to be badly-designed. For starters, it's
sometimes wrong: the only place it gets set to true is in
UndoActionsRequired (which is badly named, because from the name you
expect it to return a Boolean and to not have side effects, but
instead it doesn't return anything and does have side effects).
UndoActionsRequired() only gets called from selected places, like
AbortCurrentTransaction(), so the rest of the time it just returns a
wrong answer. Now maybe it's never called at those times, but there's
no guard to prevent a function like CanPerformUndoActions() (which is
also badly named, because performUndoActions tells you whether you
need to perform undo actions, not whether it's possible to perform
undo actions) from being called before the flag is set. I think that
this flag should be either (1) maintained eagerly - so that wherever
we set start_urec_ptr we also set the flag right away or (2) removed -
so when we need to know, we just loop over all of the undo categories
on the spot, which is not that expensive because there aren't that
many of them.

It seems pointless to make PrepareTransaction() take undo pointers as
arguments, because those pointers are just extracted from the
transaction state, to which PrepareTransaction() has a pointer.

Thomas has already objected to another proposal to add functions that
turn 32-bit XIDs into 64-bit XIDs. Therefore, I feel confident in
predicting that he will likewise object to GetEpochForXid. I think
this needs to be changed somehow, maybe by doing what the XXX comment
you added suggests.

This patch has some problems with naming consistency. There's a
function called PushUndoRequest() which calls a function called
RegisterRollbackReq() to do the heart of the work. So, is it undo or
rollback? Are we pushing or registering? Is it a request or a req?
For bonus points, the flag that the function sets is called
undo_req_pushed, which is halfway in between the two competing
terminologies. Other gripes about PushUndoRequest: push is vague and
doesn't really explain what's happening, "apllying" is a typo,
per_level is a poor variable name and shouldn't be declared volatile.
This function has problems with naming in other places, too; please go
through all of the names carefully and make them consistent and
adequately descriptive.

I am not a fan of applying_subxact_undo. I think we should look for a
better design there. A couple of things occur to me. One is that we
don't necessarily need to go to FATAL; we could just force the current
transaction and all of its subtransactions fail all the way out to the
top level, but then perhaps allow new transactions to be started
afterwards. I'm not sure that's worth it, but it would work, and I
think it has precedent in SxactIsDoomed. Assuming we're going to stick
with the current FATAL plan, I think we should do something like
invent a new kind of critical section that forces ERROR to be promoted
to FATAL and then use it here. We could call it a semi-critical or
locally-critical section, and the undo machinery could use it, but
then also so could other things. I've wanted that sort of concept
before, so I think it's a good idea to try to have something general
and independent of undo. The same concept could be used in
PerformUndoActions() instead of having to invent
pg_rethrow_as_fatal(), so we'd have two uses for this mechanism right
away.

FinishPreparedTransactions() tries to apply undo actions while
interrupts are still held. Is that necessary? Can we avoid it?

It seems highly likely that the logic added to the TBLOCK_SUBCOMMIT
case inside CommitTransactionCommand and also into
ReleaseCurrentSubTransaction should have been added to
CommitSubTransaction instead. If that's not true, then we have to
believe that the TBLOCK_SUBRELEASE call to CommitSubTransaction needs
different treatment from the other two cases, which sounds unlikely;
we also have to explain why undo is somehow different from all of
these other releases that are already handled in that function, not in
its callers. I also strongly suspect it is altogether wrong to do
this before CommitSubTransaction sets s->state to TRANS_COMMIT; what
if a subxact callback throws an error?

For related reasons, I don't think that the change ReleaseSavepoint()
are right either. Notice the header comment: "As above, we don't
actually do anything here except change blockState." The "as above"
part of the comment probably didn't originally refer to
DefineSavepoint(), which definitely does do other stuff, but to
something like EndImplicitTransactionBlock() or EndTransactionBlock(),
and DefineSavepoint() got stuck in the middle later. Anyway, your
patch makes the comment false by doing actual state changes in this
function, rather than just marking the subtransactions for commit.
But why should that be right? If none of the many other bits of state
are manipulated here rather than in CommitSubTransaction(), why is
undo the one thing that is different? I guess this is basically just
compensation for the lack of any of this code in the TBLOCK_SUBRELEASE
path which I noted in the previous paragraph, but I still think the
right answer is to put it all in CommitSubTransaction() *after* we set
TRANS_COMMIT.

There are a number of things I either don't like or don't understand
about PerformUndoActions. One is that undo_req_pushed gets passed to
this function. That just looks really odd from an abstraction point
of view. Basically, we have a function whose job is to "perform undo
actions," and it gets a flag as an argument that tells it to not
actually perform some of the undo actions: that's odd. I think the
reason it's like that is because of the issue we've been discussing
elsewhere that there's a separate undo request for each category. If
you didn't have that, you wouldn't need to do this here. I'm not
saying that proves that the one-request-per-persistence-level design
is definitely wrong, but this is certainly not a point in its favor,
at least IMHO.

PerformUndoActions() also thinks that there is a possibility of
failing to insert a failed request into the error queue, and makes
reference to such requests being rediscovered by the discard worker,
but I thought (as I said in my previous email) that we had abandoned
that approach in favor of always having enough space in shared memory
to record everything. Among other problems, if you want
oldestXidHavingUndo to be calculated based on the information in
shared memory, then you have to have all the records in shared memory,
not lose some of them temporarily and have them get re-inserted into
the error queue. It also feels to me like there may be a conflict
between the everything-must-fit approach and the
one-request-per-persistence level thing you've got here. I believe
Andres's idea was one-request-per-transaction, so the idea is
something like:

- When your transaction first tries to attach to an undo log, you make
a hash table entry.
- If that fails, you error out, but you have no undo, so it's OK.
- If it works, then you know that there's no chance of aborting
without making a hash table entry, because you already did it.
- If you commit, you remove the entry, because your transaction does
not need to be undone.
- If you abort, you process the entry in the foreground if it's small
or if the number of hash table slots remaining is < max_connections.
Otherwise you leave it for the background worker to handle.

If you have one request per persistence level, you could make an entry
for the first persistence level, and then find that you are out of
room when trying to make an entry for the second persistence level. I
guess that doesn't break anything: the changes from the first
persistence level would get undone, and the second persistence level
wouldn't get any undo. Maybe that's OK, but again it doesn't seem all
that nice, so maybe we need to think about it some more.

I think there's more, but I am out of time for the moment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: POC: Cleaning up orphaned files using undo logs at 2019-07-01 07:53:26 from Thomas Munro

Responses

Re: POC: Cleaning up orphaned files using undo logs at 2019-07-16 04:32:45 from Amit Kapila
Re: POC: Cleaning up orphaned files using undo logs at 2019-07-16 11:13:05 from Amit Kapila
Re: POC: Cleaning up orphaned files using undo logs at 2019-07-18 11:11:08 from Thomas Munro
Re: POC: Cleaning up orphaned files using undo logs at 2019-07-31 13:14:00 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2019-07-15 20:44:34	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message	Bruce Momjian	2019-07-15 19:58:46	Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)