Re: POC: Cleaning up orphaned files using undo logs

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-11-25 14:02:22
Message-ID: 45455.1637848942@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Tue, Sep 28, 2021 at 7:36 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> >
> > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > On Mon, Sep 20, 2021 at 10:55 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > >
> > > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > > On Thu, Sep 9, 2021 at 8:33 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > > >
> > > > > > The problem of the temporary undo log is that it's loaded into local buffers
> > > > > > and that backend can exit w/o flushing local buffers to disk, and thus we are
> > > > > > not guaranteed to find enough information when trying to discard the undo log
> > > > > > the backend wrote. I'm thinking about the following solutions:
> > > > > >
> > > > > > 1. Let the backend manage temporary undo log on its own (even the slot
> > > > > > metadata would stay outside the shared memory, and in particular the
> > > > > > insertion pointer could start from 1 for each session) and remove the
> > > > > > segment files at the same moment the temporary relations are removed.
> > > > > >
> > > > > > However, by moving the temporary undo slots away from the shared memory,
> > > > > > computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
> > > > > > be affected. It might seem that a transaction which only writes undo log
> > > > > > for temporary relations does not need to affect oldestFullXidHavingUndo,
> > > > > > but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
> > > > > > prevents transactions to be truncated from the CLOG too early, I wonder if
> > > > > > the following is possible (This scenario is only applicable to the zheap
> > > > > > storage engine [1], which is not included in this patch, but should already
> > > > > > be considered.):
> > > > > >
> > > > > > A transaction creates a temporary table, does some (many) changes and then
> > > > > > gets rolled back. The undo records are being applied and it takes some
> > > > > > time. Since XID of the transaction did not affect oldestFullXidHavingUndo,
> > > > > > the XID can disappear from the CLOG due to truncation.
> > > > > >
> > > > >
> > > > > By above do you mean to say that in zheap code, we don't consider XIDs
> > > > > that operate on temp table/undo for oldestFullXidHavingUndo?
> > > >
> > > > I was referring to the code
> > > >
> > > > /* We can't process temporary undo logs. */
> > > > if (log->meta.persistence == UNDO_TEMP)
> > > > continue;
> > > >
> > > > in undodiscard.c:UndoDiscard().
> > > >
> > >
> > > Here, I think it will just skip undo of temporary undo logs and
> > > oldestFullXidHavingUndo should be advanced after skipping it.
> >
> > Right, it'll be adavanced, but the transaction XID (if the transaction wrote
> > only to temporary relations) might still be needed.
> >
> > > > >
> > > > > > However zundo.c in
> > > > > > [1] indicates that the transaction status *is* checked during undo
> > > > > > execution, so we might have a problem.
> > > > > >
> > > > >
> > > > > It would be easier to follow if you can tell which exact code are you
> > > > > referring here?
> > > >
> > > > In meant the call of TransactionIdDidCommit() in
> > > > zundo.c:zheap_exec_pending_rollback().
> > > >
> > >
> > > IIRC, this should be called for temp tables after they have exited as
> > > this is only to apply the pending undo actions if any, and in case of
> > > temporary undo after session exit, we shouldn't need it.
> >
> > I see (had to play with debugger a bit). Currently this works because the
> > temporary relations are dropped by AbortTransaction() ->
> > smgrDoPendingDeletes(), before the undo execution starts. The situation will
> > change as soon as the file removal will also be handled by the undo subsystem,
> > however I'm still not sure how to hit the TransactionIdDidCommit() call for
> > the XID already truncated from CLOG.
> >
> > I'm starting to admint that there's no issue here: temporary undo is always
> > applied immediately in foreground, and thus the zheap_exec_pending_rollback()
> > function never needs to examine XID which no longer exists in the CLOG.
> >
> > > I am not able to understand what exact problem you are facing for temp
> > > tables after the session exit. Can you please explain it a bit more?
> >
> > The problem is that the temporary undo buffers are loaded into backend-local
> > buffers. Thus there's no guarantee that we'll find a consistent information in
> > the undo file even if the backend exited cleanly (local buffers are not
> > flushed at backend exit and there's no WAL for them). However, we need to read
> > the undo file to find out if (part of) it can be discarded.
> >
> > I'm trying to find out whether we can ignore the temporary undo when trying to
> > advance oldestFullXidHavingUndo or not.
> >
>
> It seems this is the crucial point. In the code, you pointed, we
> ignore the temporary undo while advancing oldestFullXidHavingUndo but
> if you find any case where that is not true then we need to discuss
> what is the best way to solve it.

As I already said above, I think now that the computation of
oldestFullXidHavingUndo can actually ignore the temporary undo, like it
happens in the zheap fork of postgres. At least I couldn't eventually find the
corner case that would break the current solution.

So it should be ok if the temporary undo is managed and discarded by
individual backends. Patch 0005 of the new series tries to do that.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
undo-20211125.tgz application/gzip 210.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2021-11-25 14:08:44 Re: row filtering for logical replication
Previous Message Amit Kapila 2021-11-25 13:55:37 Re: pg_upgrade and publication/subscription problem