Re: POC: Cleaning up orphaned files using undo logs

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-08-16 15:00:18
Message-ID: 20210816150018.o7xbd74s6dzuhdtt@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Wed, Jun 30, 2021 at 07:41:16PM +0200, Antonin Houska wrote:
> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > tsunakawa(dot)takay(at)fujitsu(dot)com <tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> >
> > > I'm crawling like a snail to read the patch set. Below are my first set of review comments, which are all minor.
> >
> > Thanks.
>
> I've added the patch to the upcoming CF [1], so it possibly gets more review
> and makes some progress. I've marked myself as the author so it's clear who
> will try to respond to the reviews. It's clear that other people did much more
> work on the feature than I did so far - they are welcome to add themselves to
> the author list.
>
> [1] https://commitfest.postgresql.org/33/3228/

Hi,

I'm crawling through the patch set like even slower creature than a snail,
sorry for long absence. I'm reading the latest version posted here and,
although it's hard to give any high level design comments on it yet, I thought
it could be useful to post a few findings and questions in the meantime.

* One question about the functionality:

> On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
> Attached is the next version. Changes done:
>
> * Removed the progress tracking and implemented undo discarding in a simpler
> way. Now, instead of maintaining the pointer to the last record applied,
> only a boolean field in the chunk header is set when ROLLBACK is
> done. This helps to determine whether the undo of a non-committed
> transaction can be discarded.

Just to clarify, the whole feature was removed for the sake of
simplicity, right?

* By throwing at the patchset `make installcheck` I'm getting from time to time
and error on the restart:

TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
File: "undorecordset.c", Line: 1098, PID: 6055)

From what I see XLogReadBufferForRedoExtended finds an invalid buffer and
returns BLK_NOTFOUND. The commentary says:

If the block was not found, then it must be discarded later in
the WAL.

and continues with skip = false, but fails to get a page from an invalid
buffer few lines later. It seems that the skip flag is supposed to be used
this situation, should it also guard the BufferGetPage part?

* Another interesting issue I've found happened inside
DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing
stuck on:

slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i])

iterating on the same element over and over. My guess is
clear_private_free_lists was called and caused such unexpected outcome,
should the access to shared_free_lists be somehow protected?

* I also wonder about the segments in base/undo, the commentary in pg_undodump
says:

Since the UNDO log is a continuous stream of changes, any hole
terminates processing.

It looks like it's relatively easy to end up with such holes, and pg_undodump
ends up with a message (found is added by me and contains a found offset
which do not match the expected value):

pg_undodump: error: segment 0000000000 missing in log 2, found 0000100000

This seems to be not causing any real issues, but it's not clear for me if
such situation with gaps is fine or is it a problem?

Other than that one more time thank you for this tremendous work, I find that
the topic is of extreme importance.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-08-16 15:04:54 Re: PG14: Avoid checking output-buffer-length for every encoded byte during pg_hex_encode
Previous Message Dilip Kumar 2021-08-16 14:47:47 Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o