Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-09-25 06:46:40
Message-ID: CAA4eK1+UtutcnUY4LgfS_ndA81tEDr5F67WVFixLYejGObW0Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 24, 2021 at 4:44 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > On Mon, Sep 20, 2021 at 10:24 AM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> > >
> > > Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > > On Fri, Sep 17, 2021 at 9:50 PM Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> > > > >
> > > > > > On Tue, Sep 14, 2021 at 10:51:42AM +0200, Antonin Houska wrote:
> > > > >
> > > > > * What happened with the idea of abandoning discard worker for the sake
> > > > > of simplicity? From what I see limiting everything to foreground undo
> > > > > could reduce the core of the patch series to the first four patches
> > > > > (forgetting about test and docs, but I guess it would be enough at
> > > > > least for the design review), which is already less overwhelming.
>
> > > What we can miss, at least for the cleanup of the orphaned files, is the *undo
> > > worker*. In this patch series the cleanup is handled by the startup process.
> > >
> >
> > Okay, I think various people at different point of times has suggested
> > that idea. I think one thing we might need to consider is what to do
> > in case of a FATAL error? In case of FATAL error, it won't be
> > advisable to execute undo immediately, so would we upgrade the error
> > to PANIC in such cases. I remember vaguely that for clean up of
> > orphaned files that can happen rarely and someone has suggested
> > upgrading the error to PANIC in such a case but I don't remember the
> > exact details.
>
> Do you mean FATAL error during normal operation?
>

Yes.

> As far as I understand, even
> zheap does not rely on immediate UNDO execution (otherwise it'd never
> introduce the undo worker), so FATAL only means that the undo needs to be
> applied later so it can be discarded.
>

Yeah, zheap either applies undo later via background worker or next
time before dml operation if there is a need.

> As for the orphaned files cleanup feature with no undo worker, we might need
> PANIC to ensure that the undo is applied during restart and that it can be
> discarded, otherwise the unapplied undo log would stay there until the next
> (regular) restart and it would block discarding. However upgrading FATAL to
> PANIC just because the current transaction created a table seems quite
> rude.
>

True, I guess but we can once see in what all scenarios it can
generate FATAL during that operation.

> So the undo worker might be needed even for this patch?
>

I think we can keep undo worker as a separate patch and for base patch
keep the idea of promoting FATAL to PANIC. This will at the very least
make the review easier.

> Or do you mean FATAL error when executing the UNDO?
>

No.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-09-25 07:23:11 Re: Skipping logical replication transactions on subscriber side
Previous Message Amit Kapila 2021-09-25 05:29:46 Re: row filtering for logical replication