Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-05-15 10:28:30
Message-ID: CAA4eK1+H6opxK1YENP5DqggRhobO78GwTPUK41cESypYDpKu2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 13, 2019 at 11:36 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sun, May 12, 2019 at 2:15 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > I have removed some of the globals and also improved some comments.
>
> I don't like the discard_lock very much. Perhaps it's OK, but I hope
> that there are better alternatives. One problem with Thomas Munro
> pointed out to me in off-list discussion is that the discard_lock has
> to be held by anyone reading undo even if the undo they are reading
> and the undo that the discard worker wants to discard are in
> completely different parts of the undo log. Somebody could be trying
> to read an undo page written 1 second ago while the discard worker is
> trying to discard an undo page written to the same undo log 1 hour
> ago. Those things need not block each other, but with this design
> they will.
>

Yeah, this doesn't appear to be a good way to deal with the problem.

> Another problem is that we end up holding it across an
> I/O; there's precedent for that, but it's not particularly good
> precedent. Let's see if we can do better.
>
> But then I had what I think may be a better idea.
>

+1. I also think the below idea is better than the previous one.

> Let's add a new
> ReadBufferMode that suppresses the actual I/O; if the buffer is not
> already present in shared_buffers, it allocates a buffer but returns
> it without doing any I/O, so the caller must be prepared for BM_VALID
> to be unset. I don't know what to call this, so I'll call it
> RBM_ALLOCATE (leaving room for possible future variants like
> RBM_ALLOCATE_AND_LOCK). Then, the protocol for reading an undo buffer
> would go like this:
>
> 1. Read the buffer with RBM_ALLOCATE, thus acquiring a pin on the
> relevant buffer.
> 2. Check whether the buffer precedes the discard horizon for that undo
> log stored in shared memory.
> 3. If so, use the ForgetBuffer() code we have in the zheap branch to
> deallocate the buffer and stop here. The undo is not available to be
> read, whether it's still physically present or not.
> 4. Otherwise, if the buffer is not valid, call ReadBufferExtended
> again, or some new function, to make it so. Remember to release all
> of our pins.
>
> The protocol for discarding an undo buffer would go like this:
>
> 1. Advance the discard horizon in shared memory.
> 2. Take a cleanup lock on each buffer that ought to be discarded.
> Remember the dirty ones and forget the others.
> 3. WAL-log the discard operation.
> 4. Revisit the dirty buffers we remembered in step 2 and forget them.
>
> The idea is that, once we've advanced the discard horizon in shared
> memory, any readers that come along later are responsible for making
> sure that they never do I/O on any older undo. They may create some
> invalid buffers in shared memory, but they'll hopefully also get rid
> of them if they do, and if they error out for some reason before doing
> so, that buffer should age out naturally. So, the discard worker just
> needs to worry about buffers that already exist. Once it's taken a
> cleanup lock on each buffer, it knows that there are no I/O operations
> and in fact no buffer usage of any kind still in progress from before
> it moved the in-memory discard horizon. Anyone new that comes along
> will clean up after themselves. We postpone forgetting dirty buffers
> until after we've successfully WAL-logged the discard, in case we fail
> to do so.
>

I have spent some time thinking over this and couldn't see any problem
with this. So, +1 for trying this out on the lines of what you have
described above.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kuntal Ghosh 2019-05-15 10:44:23 Re: Avoiding deadlock errors in CREATE INDEX CONCURRENTLY
Previous Message Hubert Zhang 2019-05-15 10:19:38 Replace hashtable growEnable flag