Re: POC: Cleaning up orphaned files using undo logs

From: Antonin Houska <ah(at)cybertec(dot)at>
To:
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-01-29 17:30:15
Message-ID: 87363.1611941415@antos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Dmitry Dolgov <9erthalion6(at)gmail(dot)com> wrote:
>
> > Thanks for the updated patch. As I've mentioned off the list I'm slowly
> > looking through it with the intent to concentrate on undo progress
> > tracking. But before I will post anything I want to mention couple of
> > strange issues I see, otherwise I will forget for sure. Maybe it's
> > already known, but running several times 'make installcheck' against a
> > freshly build postgres with the patch applied from time to time I
> > observe various errors.
> >
> > This one happens on a crash recovery, seems like
> > UndoRecordSetXLogBufData has usr_type = USRT_INVALID and is involved in
> > the replay process:
> >
> > TRAP: FailedAssertion("page_offset + this_page_bytes <= uph->ud_insertion_point", File: "undopage.c", Line: 300)
> > postgres: startup recovering 000000010000000000000012(ExceptionalCondition+0xa1)[0x558b38b8a350]
> > postgres: startup recovering 000000010000000000000012(UndoPageSkipOverwrite+0x0)[0x558b38761b7e]
> > postgres: startup recovering 000000010000000000000012(UndoReplay+0xa1d)[0x558b38766f32]
> > postgres: startup recovering 000000010000000000000012(XactUndoReplay+0x77)[0x558b38769281]
> > postgres: startup recovering 000000010000000000000012(smgr_redo+0x1af)[0x558b387aa7bd]
> >
> > This one is somewhat similar:
> >
> > TRAP: FailedAssertion("page_offset >= SizeOfUndoPageHeaderData", File: "undopage.c", Line: 287)
> > postgres: undo worker for database 36893 (ExceptionalCondition+0xa1)[0x5559c90f1350]
> > postgres: undo worker for database 36893 (UndoPageOverwrite+0xa6)[0x5559c8cc8ae3]
> > postgres: undo worker for database 36893 (UpdateLastAppliedRecord+0xbe)[0x5559c8ccd008]
> > postgres: undo worker for database 36893 (smgr_undo+0xa6)[0x5559c8d11989]
>
> Well, on repeated run of the test I could also hit the first one. I could fix
> it and will post a new version of the patch (along with some other small
> changes) this week.

Attached is the next version. Changes done:

* Removed the progress tracking and implemented undo discarding in a simpler
way. Now, instead of maintaining the pointer to the last record applied,
only a boolean field in the chunk header is set when ROLLBACK is
done. This helps to determine whether the undo of a non-committed
transaction can be discarded.

* Removed the "undo worker" that the previous version only used to apply the
undo after crash recovery. The startup process does the work now.

* Umplemented cleanup after crashed CREATE DATABASE and ALTER DATABASE ... SET TABLESPACE.

BTW, I wonder if this change allows these commands to be executed in a
transaction block. I think the reason to prohibit that is to minimize the
window between creation of the files and transaction commit - if the
server crashes in that window, the new database files survive but the
catalog changes don't. But maybe there are other reasons. (I don't claim
it's terribly useful to create database in a transaction block though
because the client cannot connect to it w/o leaving the current
transaction.)

* Reordered the diffs, i.e. moved the discarding in front of the actual
features.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

Attachment Content-Type Size
undo-20210129.tgz application/x-gzip 183.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexey Kondratov 2021-01-29 17:56:47 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Tom Lane 2021-01-29 16:41:15 Re: Dumping/restoring fails on inherited generated column