Re: POC: Cleaning up orphaned files using undo logs

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-09-28 14:26:24
Message-ID: 20210928142624.wvc242b3rtl7cg47@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Tue, Sep 21, 2021 at 10:07:55AM +0200, Dmitry Dolgov wrote:
> On Tue, 21 Sep 2021 09:00 Antonin Houska, <ah(at)cybertec(dot)at> wrote:
>
> > Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
> >
> > > Yep, makes sense, thanks. I have few more questions:
> > >
> > > * The use case with orphaned files is working somewhat differently after
> > > the rebase on the latest master, do you observe it as well? The
> > > difference is ApplyPendingUndo -> SyncPostCheckpoint doesn't clean up
> > > an orphaned relation file immediately (only later on checkpoint)
> > > because of empty pendingUnlinks. I haven't investigated more yet, but
> > > seems like after this commit:
> > >
> > > commit 7ff23c6d277d1d90478a51f0dd81414d343f3850
> > > Author: Thomas Munro <tmunro(at)postgresql(dot)org>
> > > Date: Mon Aug 2 17:32:20 2021 +1200
> > >
> > > Run checkpointer and bgwriter in crash recovery.
> > >
> > > Start up the checkpointer and bgwriter during crash recovery
> > (except in
> > > --single mode), as we do for replication. This wasn't done back
> > in
> > > commit cdd46c76 out of caution. Now it seems like a better idea
> > to make
> > > the environment as similar as possible in both cases. There may
> > also be
> > > some performance advantages.
> > >
> > > something has to be updated (pendingOps are empty right now, so no
> > > unlink request is remembered).
> >
> > I haven't been debugging that part recently, but yes, this commit is
> > relevant,
> > thanks for pointing that out! Attached is a patch that should fix it. I'll
> > include it in the next version of the patch series, unless you tell me that
> > something is still wrong.
> >
>
> Sure, but I can take a look only in a couple of days.

Thanks for the patch.

Hm, maybe there is some misunderstanding. My question above was about
the changed behaviour, when orphaned files (e.g. created relation files
after the backend was killed) are removed only by checkpointer when it
kicks in. As far as I understand, the original intention was to do this
job right away, that's why SyncPre/PostCheckpoint was invoked. But the
recent changes around checkpointer make the current implementation
insufficient.

The patch you've proposed removes invokation of SyncPre/PostCheckpoint,
do I see it correctly? In this sense it doesn't change anything, except
removing non-functioning code of course. But the question, probably
reformulated from the more design point of view, stays the same — when
and by which process such orphaned files have to be removed? I've
assumed by removing right away the previous version was trying to avoid
any kind of thunder effects of removing too many at once, but maybe I'm
mistaken here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-09-28 14:45:46 Re: Added schema level support for publication.
Previous Message Magnus Hagander 2021-09-28 14:25:11 Re: Add prefix pg_catalog to pg_get_statisticsobjdef_columns() in describe.c (\dX)