Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Andres Freund <andres(at)anarazel(dot)de>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints
Date: 2021-09-05 20:28:00
Message-ID: 20210905202800.ji4fnfs3xzhvo7l6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2021-09-05 14:22:51 +0530, Dilip Kumar wrote:
> On Sat, Sep 4, 2021 at 3:24 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> > Hi,
> >
> > On 2021-09-03 14:25:10 +0530, Dilip Kumar wrote:
> > > Yeah, we can surely lock the relation as described by Robert, but IMHO,
> > > while creating the database we are already holding the exclusive lock on
> > > the database and there is no one else allowed to be connected to the
> > > database, so do we actually need to bother about the lock for the
> > > correctness?
> >
> > The problem is that checkpointer, bgwriter, buffer reclaim don't care about
> > the database of the buffer they're working on... The exclusive lock on the
> > database doesn't change anything about that.
>
>
> But these directly operate on the buffers and In my patch, whether we are
> reading the pg_class for identifying the relfilenode or we are copying the
> relation block by block we are always holding the lock on the buffer.

I don't think a buffer lock is really sufficient. See e.g. code like:

static void
InvalidateBuffer(BufferDesc *buf)
{
...
/*
* We assume the only reason for it to be pinned is that someone else is
* flushing the page out. Wait for them to finish. (This could be an
* infinite loop if the refcount is messed up... it would be nice to time
* out after awhile, but there seems no way to be sure how many loops may
* be needed. Note that if the other guy has pinned the buffer but not
* yet done StartBufferIO, WaitIO will fall through and we'll effectively
* be busy-looping here.)
*/
if (BUF_STATE_GET_REFCOUNT(buf_state) != 0)
{
UnlockBufHdr(buf, buf_state);
LWLockRelease(oldPartitionLock);
/* safety check: should definitely not be our *own* pin */
if (GetPrivateRefCount(BufferDescriptorGetBuffer(buf)) > 0)
elog(ERROR, "buffer is pinned in InvalidateBuffer");
WaitIO(buf);
goto retry;
}

IOW, currently we assume that you're only allowed to pin a block in a relation
while you hold a lock on the relation. It might be a good idea to change that,
but it's not as trivial as one might think - consider e.g. dropping a
relation's buffers while holding an exclusive lock: If there's potential
concurrent reads of that buffer we'd be in trouble.

> 3. While copying the relation whether to use the bufmgr or directly use the
> smgr?
>
> If we use the bufmgr then maybe we can avoid flushing some of the buffers
> to the disk and save some I/O but in general we copy from the template
> database so there might not be a lot of dirty buffers and we might not save
> anything

I would assume the big benefit would be that the *target* database does not
have to be written out / shared buffer is immediately populated.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-09-05 21:44:37 Re: stat() vs ERROR_DELETE_PENDING, round N + 1
Previous Message Tom Lane 2021-09-05 20:11:14 Re: use-regular-expressions-to-simplify-less_greater-and-not_equals.patch