Re: POC: Cleaning up orphaned files using undo logs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-08-05 04:23:34
Message-ID: CA+hUKG+k1qc=XTD4SNu_-VUxDn=3V9KNvLMC7Ps__L6qGoyQzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 5, 2019 at 3:54 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Sun, Aug 4, 2019 at 2:46 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> > I had a look at the UNDO patches at
> > https://www.postgresql.org/message-id/CAA4eK1KKAFBCJuPnFtgdc89djv4xO%3DZkAdXvKQinqN4hWiRbvA%40mail.gmail.com,
> > and at the patch to use the UNDO logs to clean up orphaned files, from
> > undo-2019-05-10.tgz earlier in this thread. Are these the latest ones to
> > review?
>
> Yes, I am not sure of cleanup orphan file patch (Thomas can confirm
> the same), but others are latest.

I have a new patch set to post soon, handling all the feedback that
arrived in the past couple of weeks from 5 different reviewers (thanks
all!).

> > There are similar issues in CREATE/DROP DATABASE code. If you crash in
> > the middle of CREATE DATABASE, you can be left with orphaned files in
> > the data directory, or if you crash in the middle of DROP DATABASE, the
> > data might be gone already but the pg_database entry is still there. We
> > should plug those holes too.
> >
>
> +1. Interesting.

Huh. Right.

> > Could we leave out the UNDO and discard worker processes for now?
> > Execute all UNDO actions immediately at rollback, and after crash
> > recovery. That would be fine for cleaning up orphaned files,
> >
>
> Even if we execute all the undo actions on rollback, we need discard
> worker to discard undo at regular intervals. Also, what if we get an
> error while applying undo actions during rollback? Right now, we have
> a mechanism to push such a request to background worker and allow the
> session to continue. Instead, we might want to Panic in such cases if
> we don't want to have background undo workers.
>
> > and it
> > would cut down the size of the patch to review.
>
> If we can find some way to handle all cases and everyone agrees to it,
> that would be good. In fact, we can try to get the basic stuff
> committed first and then try to get the rest (undo-worker machinery)
> done.

I think it's definitely worth exploring.

> > Can this race condition happen: Transaction A creates a table and an
> > UNDO record to remember it. The transaction is rolled back, and the file
> > is removed. Another transaction, B, creates a different table, and
> > chooses the same relfilenode. It loads the table with data, and commits.
> > Then the system crashes. After crash recovery, the UNDO record for the
> > first transaction is applied, and it removes the file that belongs to
> > the second table, created by transaction B.
>
> I don't think such a race exists, but we should verify it once.
> Basically, once the rollback is complete, we mark the transaction
> rollback as complete in the transaction header in undo and write a WAL
> for it. After crash-recovery, we will skip such a transaction. Isn't
> that sufficient to prevent such a race condition?

The usual protection against relfilenode recycling applies: we don't
actually remove the files on disk until after the next checkpoint,
following the successful rollback. That is, the executing the
rollback doesn't actually remove any files immediately, so you can't
reuse the OID yet.

There might be some problems like that if we tried to handle the
CREATE DATABASE orphans you mentioned too naively though. Not sure.

> Thank you for looking into this work.

+1

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-08-05 04:31:09 Re: partition routing layering in nodeModifyTable.c
Previous Message Amit Kapila 2019-08-05 03:54:41 Re: POC: Cleaning up orphaned files using undo logs