Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-11-09 20:58:36
Message-ID: 20231109205836.zjoawdrn4q77yemv@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-08 13:10:34 +0530, Bharath Rupireddy wrote:
> > > + /*
> > > + * The fact that we acquire WALBufMappingLock while reading the WAL
> > > + * buffer page itself guarantees that no one else initializes it or
> > > + * makes it ready for next use in AdvanceXLInsertBuffer(). However, we
> > > + * need to ensure that we are not reading a page that just got
> > > + * initialized. For this, we look at the needed page header.
> > > + */
> > > + phdr = (XLogPageHeader) page;
> > > +
> > > + /* Return, if WAL buffer page doesn't look valid. */
> > > + if (!(phdr->xlp_magic == XLOG_PAGE_MAGIC &&
> > > + phdr->xlp_pageaddr == (ptr - (ptr % XLOG_BLCKSZ)) &&
> > > + phdr->xlp_tli == tli))
> > > + break;
> >
> > I don't think this code should ever encounter a page where this is not the
> > case? We particularly shouldn't do so silently, seems that could hide all
> > kinds of problems.
>
> I think it's possible to read a "just got initialized" page with the
> new approach to read WAL buffer pages without WALBufMappingLock if the
> page is read right after it is initialized and xlblocks is filled in
> AdvanceXLInsertBuffer() but before actual WAL is written.

I think the code needs to make sure that *never* happens. That seems unrelated
to holding or not holding WALBufMappingLock. Even if the page header is
already valid, I don't think it's ok to just read/parse WAL data that's
concurrently being modified.

We can never allow WAL being read that's past
XLogBytePosToRecPtr(XLogCtl->Insert->CurrBytePos)
as it does not exist.

And if the to-be-read LSN is between
XLogCtl->LogwrtResult->Write and XLogBytePosToRecPtr(Insert->CurrBytePos)
we need to call WaitXLogInsertionsToFinish() before copying the data.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2023-11-09 21:14:07 Re: pg_walfile_name_offset can return inconsistent values
Previous Message Matthias van de Meent 2023-11-09 20:49:48 Re: pg_walfile_name_offset can return inconsistent values