Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-11-13 13:32:07
Message-ID: CALj2ACXo-_boM3h-PJuvfuvFzP1nv2wcZBoZN_YbRbQBGQYAUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 10, 2023 at 2:28 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2023-11-08 13:10:34 +0530, Bharath Rupireddy wrote:
> > > > + /*
> > > > + * The fact that we acquire WALBufMappingLock while reading the WAL
> > > > + * buffer page itself guarantees that no one else initializes it or
> > > > + * makes it ready for next use in AdvanceXLInsertBuffer(). However, we
> > > > + * need to ensure that we are not reading a page that just got
> > > > + * initialized. For this, we look at the needed page header.
> > > > + */
> > > > + phdr = (XLogPageHeader) page;
> > > > +
> > > > + /* Return, if WAL buffer page doesn't look valid. */
> > > > + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
> > > > + phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
> > > > + phdr->xlp_tli == tli))
> > > > + break;
> > >
> > > I don't think this code should ever encounter a page where this is not the
> > > case? We particularly shouldn't do so silently, seems that could hide all
> > > kinds of problems.
> >
> > I think it's possible to read a "just got initialized" page with the
> > new approach to read WAL buffer pages without WALBufMappingLock if the
> > page is read right after it is initialized and xlblocks is filled in
> > AdvanceXLInsertBuffer() but before actual WAL is written.
>
> I think the code needs to make sure that *never* happens. That seems unrelated
> to holding or not holding WALBufMappingLock. Even if the page header is
> already valid, I don't think it's ok to just read/parse WAL data that's
> concurrently being modified.
>
> We can never allow WAL being read that's past
> XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos)
> as it does not exist.

Agreed. Erroring out in XLogReadFromBuffers() if passed in WAL is past
the CurrBytePos is an option. Another cleaner way is to just let the
caller decide what it needs to do (retry or error out) - fill an error
message in XLogReadFromBuffers() and return as-if nothing was read or
return a special negative error code like XLogDecodeNextRecord so that
the caller can deal with it.

Also, reading CurrBytePos with insertpos_lck spinlock can come in the
way of concurrent inserters. A possible way is to turn both
CurrBytePos and PrevBytePos 64-bit atomics so that
XLogReadFromBuffers() can read CurrBytePos without any lock atomically
and leave it to the caller to deal with non-existing WAL reads.

> And if the to-be-read LSN is between
> XLogCtl->LogwrtResult->Write and XLogBytePosToRecPtr(Insert->CurrBytePos)
> we need to call WaitXLogInsertionsToFinish() before copying the data.

Agree to wait for all in-flight insertions to the pages we're about to
read to finish. But, reading XLogCtl->LogwrtRqst.Write requires either
XLogCtl->info_lck spinlock or WALWriteLock. Maybe turn
XLogCtl->LogwrtRqst.Write a 64-bit atomic and read it without any
lock, rely on
WaitXLogInsertionsToFinish()'s return value i.e. if
WaitXLogInsertionsToFinish() returns a value >= Insert->CurrBytePos,
then go read that page from WAL buffers.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-11-13 13:39:15 Re: proposal: possibility to read dumped table's name from file
Previous Message Andrew Dunstan 2023-11-13 13:27:39 Re: pg_basebackup check vs Windows file path limits