Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2024-02-12 23:36:48
Message-ID: 20240212233648.4ephivkh7ukkiv2k@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2024-02-12 12:46:00 -0800, Jeff Davis wrote:
> On Mon, 2024-02-12 at 12:18 -0800, Andres Freund wrote:
> > +    upto = Min(startptr + count, LogwrtResult.Write);
> > +    nbytes = upto - startptr;
> >
> > Shouldn't it pretty much be a bug to ever encounter this?
>
> In the current code it's impossible, though Bharath hinted at an
> extension which could reach that path.
>
> What I committed was a bit of a compromise -- earlier versions of the
> patch supported reading right up to the Insert pointer (which requires
> a call to WaitXLogInsertionsToFinish()). I wasn't ready to commit that
> code without seeing a more about how that would be used, but I thought
> it was reasonable to have some simple code in there to allow reading up
> to the Write pointer.

I doubt there's a sane way to use WALRead() without *first* ensuring that the
range of data is valid. I think we're better of moving that responsibility
explicitly to the caller and adding an assertion verifying that.

> It seems closer to the structure that we will ultimately need to
> replicate unflushed data, right?

It doesn't really seem like a necessary, or even particularly useful,
part. You couldn't just call WALRead() for that, since the caller would need
to know the range up to which WAL is valid but not yet flushed as well. Thus
the caller would need to first use WaitXLogInsertionsToFinish() or something
like it anyway - and then there's no point in doing the WALRead() anymore.

Note that for replicating unflushed data, we *still* might need to fall back
to reading WAL data from disk. In which case not asserting in WALRead() would
just make it hard to find bugs, because not using WaitXLogInsertionsToFinish()
would appear to work as long as data is in wal buffers, but as soon as we'd
fall back to on-disk (but unflushed) data, we'd send bogus WAL.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2024-02-12 23:41:34 Re: glibc qsort() vulnerability
Previous Message Matthias van de Meent 2024-02-12 23:10:56 Re: Reducing output size of nodeToString