Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-03-07 07:09:13
Message-ID: CALj2ACUdmoMH3tfvbZ83JsHjdfRNQbRift95sxma2Qm6AgPhxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> +void
> +XLogReadFromBuffers(XLogRecPtr startptr,
>
> Since this function presently doesn't return anything, can we have it
> return the number of bytes read instead of storing it in a pointer
> variable?

Done.

> + ptr = startptr;
> + nbytes = count;
> + dst = buf;
>
> These variables seem superfluous.

Needed startptr and count for DEBUG1 message and assertion at the end.
Removed dst and used buf in the new patch now.

> + /*
> + * Requested WAL isn't available in WAL buffers, so return with
> + * what we have read so far.
> + */
> + break;
>
> nitpick: I'd move this to the top so that you can save a level of
> indentation.

Done.

> + /*
> + * All the bytes are not in one page. Read available bytes on
> + * the current page, copy them over to output buffer and
> + * continue to read remaining bytes.
> + */
>
> Is it possible to memcpy more than a page at a time?

It would complicate things a lot there; the logic to figure out the
last page bytes that may or may not fit in the whole page gets
complicated. Also, the logic to verify each page's header gets
complicated. We might lose out if we memcpy all the pages at once and
start verifying each page's header in another loop.

I would like to keep it simple - read a single page from WAL buffers,
verify it and continue.

> + /*
> + * The fact that we acquire WALBufMappingLock while reading the WAL
> + * buffer page itself guarantees that no one else initializes it or
> + * makes it ready for next use in AdvanceXLInsertBuffer().
> + *
> + * However, we perform basic page header checks for ensuring that
> + * we are not reading a page that just got initialized. Callers
> + * will anyway perform extensive page-level and record-level
> + * checks.
> + */
>
> Hm. I wonder if we should make these assertions instead.

Okay. I added XLogReaderValidatePageHeader for assert-only builds
which will help catch any issues there. But we can't perform record
level checks here because this function doesn't know where the record
starts from, it knows only pages. This change required us to pass in
XLogReaderState to XLogReadFromBuffers. I marked it as
PG_USED_FOR_ASSERTS_ONLY and did page header checks only when it is
passed as non-null so that someone who doesn't have XLogReaderState
can still read from buffers.

> + elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for given LSN %X/%X, Timeline ID %u",
> + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli);
>
> I definitely don't think we should put an elog() in this code path.
> Perhaps this should be guarded behind WAL_DEBUG.

Placing it behind WAL_DEBUG doesn't help users/developers. My
intention was to let users know that the WAL read hit the buffers,
it'll help them report if any issue occurs and also help developers to
debug that issue.

On a different note - I was recently looking at the code around
WAL_DEBUG macro and the wal_debug GUC. It looks so complex that one
needs to build source code with the WAL_DEBUG macro and enable the GUC
to see the extended logs for WAL. IMO, the best way there is either:
1) unify all the code under WAL_DEBUG macro and get rid of wal_debug GUC, or
2) unify all the code under wal_debug GUC (it is developer-only and
superuser-only so there shouldn't be a problem even if we ship it out
of the box).

If someone is concerned about the GUC being enabled on production
servers knowingly or unknowingly with option (2), we can go ahead with
option (1). I will discuss this separately to see what others think.

> I think we can simplify this. We effectively take the same action any time
> "count" doesn't equal "read_bytes", so there's no need for the "else if".
>
> if (count == read_bytes)
> return true;
>
> buf += read_bytes;
> startptr += read_bytes;
> count -= read_bytes;

I wanted to avoid setting these unnecessarily for buffer misses.

Thanks a lot for reviewing. I'm attaching the v8 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v8-0001-Improve-WALRead-to-suck-data-directly-from-WAL-bu.patch application/octet-stream 7.6 KB
v8-0002-Add-test-module-for-verifying-WAL-read-from-WAL-b.patch application/octet-stream 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-07 07:18:21 Re: Add pg_walinspect function with block info columns
Previous Message Amit Kapila 2023-03-07 06:57:58 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher