Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2024-01-26 14:01:24
Message-ID: CALj2ACW65mqn6Ukv57SqDTMzAJgd1N_AdQtDgy+gMDqu6v618Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 26, 2024 at 8:31 AM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > PSA v20 patch set.
>
> 0001 is very close. I have the following suggestions:
>
> * Don't just return zero. If the caller is doing something we don't
> expect, we want to fix the caller. I understand you'd like this to be
> more like a transparent optimization, and we may do that later, but I
> don't think it's a good idea to do that now.

+ if (RecoveryInProgress() ||
+ tli != GetWALInsertionTimeLine())
+ return ntotal;
+
+ Assert(!XLogRecPtrIsInvalid(startptr));

Are you suggesting to error out instead of returning 0? If yes, I
disagree with it. Because, failure to read due to unmet pre-conditions
doesn't necessarily have to be to error out. If we error out, the
immediate failure we see is in the src/bin/psql TAP test for calling
XLogReadFromBuffers when the server is in recovery. How about
returning a negative value instead of just 0 or returning true/false
just like WALRead?

> * There's currently no use for reading LSNs between Write and Insert,
> so remove the WaitXLogInsertionsToFinish() code path. That also means
> we don't need the extra emitLog parameter, so we can remove that. When
> we have a use case, we can bring it all back.

I disagree with this. I don't see anything wrong with
XLogReadFromBuffers having the capability to wait for in-progress
insertions to finish. In fact, it makes the function near-complete.
Imagine, implementing an extension (may be for fun or learning or
educational or production purposes) to read unflushed WAL directly
from WAL buffers using XLogReadFromBuffers as page_read callback with
xlogreader facility. AFAICT, I don't see a problem with
WaitXLogInsertionsToFinish logic in XLogReadFromBuffers.

FWIW, one important aspect of XLogReadFromBuffers is its ability to
read the unflushed WAL from WAL buffers. Also, see a note from Andres
here https://www.postgresql.org/message-id/20231109205836.zjoawdrn4q77yemv%40awork3.anarazel.de.

> If you agree, I can just make those adjustments (and do some final
> checking) and commit 0001. Otherwise let me know what you think.

Thanks. Please see my responses above.

> 0002: How does the test control whether the data requested is before
> the Flush pointer, the Write pointer, or the Insert pointer? What if
> the walwriter comes in and moves one of those pointers before the next
> statement is executed?

Tried to keep wal_writer quiet with wal_writer_delay=10000ms and
wal_writer_flush_after = 1GB to not to flush WAL in the background.
Also, disabled autovacuum, and set checkpoint_timeout to a higher
value. All of this is done to generate minimal WAL so that WAL buffers
don't get overwritten. Do you see any problems with it?

> Also, do you think a test module is required for
> the basic functionality in 0001, or only when we start doing more
> complex things like reading past the Flush pointer?

With WaitXLogInsertionsToFinish in XLogReadFromBuffers, we have that
capability already in. Having a separate test module ensures the code
is tested properly.

As far as the test is concerned, it verifies 2 cases:
1. Check if WAL is successfully read from WAL buffers. For this, the
test generates minimal WAL and reads from WAL buffers from the start
LSN = current insert LSN captured before the WAL generation.
2. Check with a WAL that doesn't yet exist. For this, the test reads
from WAL buffers from the start LSN = current flush LSN+16MB (a
randomly chosen higher value).

> 0003: can you explain why this is useful for wal summarizer to read
> from the buffers?

Can the WAL summarizer ever read the WAL on current TLI? I'm not so
sure about it, I haven't explored it in detail.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2024-01-26 14:01:52 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Previous Message jian he 2024-01-26 13:44:34 Re: Add new error_action COPY ON_ERROR "log"