Re: POC: Cleaning up orphaned files using undo logs

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-08-31 12:46:30
Message-ID: 57137.1630413990@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:

> Hi,
> I'm crawling through the patch set like even slower creature than a snail,
> sorry for long absence. I'm reading the latest version posted here and,
> although it's hard to give any high level design comments on it yet, I thought
> it could be useful to post a few findings and questions in the meantime.
> * One question about the functionality:
> > On Fri, Jan 29, 2021 at 06:30:15PM +0100, Antonin Houska wrote:
> > Attached is the next version. Changes done:
> >
> > * Removed the progress tracking and implemented undo discarding in a simpler
> > way. Now, instead of maintaining the pointer to the last record applied,
> > only a boolean field in the chunk header is set when ROLLBACK is
> > done. This helps to determine whether the undo of a non-committed
> > transaction can be discarded.
> Just to clarify, the whole feature was removed for the sake of
> simplicity, right?

Amit Kapila told me that zheap can recognize that particular undo record was
already applied and I could eventually find the corresponding code. So I
removed the tracking from the undo log layer, although I still think it'd fit
there. However then I found out that at least a boolean flag in the chunk
header is needed to handle the discarding, so I implemented it.

> * By throwing at the patchset `make installcheck` I'm getting from time to time
> and error on the restart:
> TRAP: FailedAssertion("BufferIsValid(buffers[nbuffers].buffer)",
> File: "undorecordset.c", Line: 1098, PID: 6055)
> From what I see XLogReadBufferForRedoExtended finds an invalid buffer and
> returns BLK_NOTFOUND. The commentary says:
> If the block was not found, then it must be discarded later in
> the WAL.
> and continues with skip = false, but fails to get a page from an invalid
> buffer few lines later. It seems that the skip flag is supposed to be used
> this situation, should it also guard the BufferGetPage part?

I could see this sometime too, but can't reproduce it now. It's also not clear
to me how XLogReadBufferForRedoExtended() can return BLK_NOTFOUND, as the
whole undo log segment is created at once, even if only part of it is needed -
see allocate_empty_undo_segment().

> * Another interesting issue I've found happened inside
> DropUndoLogsInTablespace, when the process got SIGTERM. It seems processing
> stuck on:
> slist_foreach_modify(iter, &UndoLogShared->shared_free_lists[i])
> iterating on the same element over and over. My guess is
> clear_private_free_lists was called and caused such unexpected outcome,
> should the access to shared_free_lists be somehow protected?

Well, I could get this error on repeated test run too. Thanks for the report.

The list is protected by UndoLogLock. I found out that the problem was that
free_undo_log_slot() "freed" the slot but didn't remove it from the shared
freelist. Then some other backend thought it's free, picked it from the shared
slot array, used and pushed again to the shared freelist. If the same item is
already at the list head, slist_push_head() makes the initial node point to

I fixed it by removing the slot from the freelist before calling
free_undo_log_slot() from CheckPointUndoLogs(). (The other call site
DropUndoLogsInTablespace() was o.k.)

> * I also wonder about the segments in base/undo, the commentary in pg_undodump
> says:
> Since the UNDO log is a continuous stream of changes, any hole
> terminates processing.
> It looks like it's relatively easy to end up with such holes, and pg_undodump
> ends up with a message (found is added by me and contains a found offset
> which do not match the expected value):
> pg_undodump: error: segment 0000000000 missing in log 2, found 0000100000
> This seems to be not causing any real issues, but it's not clear for me if
> such situation with gaps is fine or is it a problem?

ok, I missed the point that the initial segment (or the initial sequence of
segments) of the log can be missing due to discarding and segments
recycling. I've fixed that, but if a segment is missing in the middle, it's
still considered an error.

> Other than that one more time thank you for this tremendous work, I find that
> the topic is of extreme importance.

I'm just trying to continue the tremendous work of others :-) Thanks for your

Antonin Houska

Attachment Content-Type Size
undo-20210831.tgz application/gzip 201.5 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-08-31 12:58:17 Re: Added schema level support for publication.
Previous Message Ajin Cherian 2021-08-31 12:43:45 Re: Failure of subscription tests with topminnow