Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-23 03:18:26
Message-ID: CAA4eK1LGRwgSrhPtXuWEnKg6EhmSUv9HmoPM=AuFSuk=CwWAaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 22, 2019 at 8:39 PM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> On Mon, 22 Jul 2019 at 14:21, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> -------------
>
> +UndoWorkerGetSlotInfo(int slot, UndoRequestInfo *urinfo)
> +{
> + /* Block concurrent access. */
> + LWLockAcquire(UndoWorkerLock, LW_EXCLUSIVE);
> +
> + MyUndoWorker = &UndoApplyCtx->workers[slot];
> Not sure why MyUndoWorker is used here. Can't we use a local variable
> ? Or do we intentionally attach to the slot as a side-operation ?
>
> -------------
>

I think here, we can use a local variable as well. Do you see any
problem with the current code or do you think it is better to use a
local variable here?

> --------------
>
> + if (!InsertRequestIntoErrorUndoQueue(urinfo))
> I was thinking what happens if for some reason
> InsertRequestIntoErrorUndoQueue() itself errors out. In that case, the
> entry will not be marked invalid, and so there will be no undo action
> carried out because I think the undo worker will exit. What happens
> next with this entry ?

The same entry is present in two queues xid and size, so next time it
will be executed from the second queue based on it's priority in that
queue. However, if it fails again a second time in the same way, then
we will be in trouble because now the hash table has entry, but none
of the queues has entry, so none of the workers will attempt to
execute again. Also, when discard worker again tries to register it,
we won't allow adding the entry to queue thinking either some backend
is executing the same or it must be part of some queue.

The one possibility to deal with this could be that we somehow allow
discard worker to register it again in the queue or we can do this in
critical section so that it allows system restart on error. However,
the main thing is it possible that InsertRequestIntoErrorUndoQueue
will fail unless there is some bug in the code? If so, we might want
to have an Assert for this rather than handling this condition.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2019-07-23 03:26:28 "Attachment not found" by CF-app
Previous Message Andres Freund 2019-07-23 02:41:39 Re: Psql patch to show access methods info