Re: Improve WALRead() to suck data directly from WAL buffers when possible

From: Nitin Jadhav <nitinjadhavpostgres(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve WALRead() to suck data directly from WAL buffers when possible
Date: 2023-03-11 19:22:00
Message-ID: CAMm1aWZAbp=7BpM-JdWXgWu0QTupvK1BY71Y1urKLPfU33ht8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> [1]
> subscription tests:
> PATCHED: WAL buffers hit - 1972, misses - 32616

Can you share more details about the test here?

I went through the v8 patch. Following are my thoughts to improve the
WAL buffer hit ratio.

Currently the no-longer-needed WAL data present in WAL buffers gets
cleared in XLogBackgroundFlush() which is called based on the
wal_writer_delay config setting. Once the data is flushed to the disk,
it is treated as no-longer-needed and it will be cleared as soon as
possible based on some config settings. I have done some testing by
tweaking the wal_writer_delay config setting to confirm the behaviour.
We can see that the WAL buffer hit ratio is good when the
wal_writer_delay is big enough [2] compared to smaller
wal_writer_delay [1]. So irrespective of the wal_writer_delay
settings, we should keep the WAL data in the WAL buffers as long as
possible so that all the readers (Mainly WAL senders) can take
advantage of this. The WAL page should be evicted from the WAL buffers
only when the WAL buffer is full and we need room for the new page.
The patch attached takes care of this. We can see the improvements in
WAL buffer hit ratio even when the wal_writer_delay is set to lower
value [3].

Second, In WALRead(), we try to read the data from disk whenever we
don't find the data from WAL buffers. We don't store this data in the
WAL buffer. We just read the data, use it and leave it. If we store
this data to the WAL buffer, then we may avoid a few disk reads.

[1]:
wal_buffers=1GB
wal_writer_delay=1ms
./pgbench --initialize --scale=300 postgres

WAL buffers hit=5046
WAL buffers miss=56767

[2]:
wal_buffers=1GB
wal_writer_delay=10s
./pgbench --initialize --scale=300 postgres

WAL buffers hit=45454
WAL buffers miss=14064

[3]:
wal_buffers=1GB
wal_writer_delay=1ms
./pgbench --initialize --scale=300 postgres

WAL buffers hit=37214
WAL buffers miss=844

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Tue, Mar 7, 2023 at 12:39 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Mar 7, 2023 at 3:30 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> >
> > +void
> > +XLogReadFromBuffers(XLogRecPtr startptr,
> >
> > Since this function presently doesn't return anything, can we have it
> > return the number of bytes read instead of storing it in a pointer
> > variable?
>
> Done.
>
> > + ptr = startptr;
> > + nbytes = count;
> > + dst = buf;
> >
> > These variables seem superfluous.
>
> Needed startptr and count for DEBUG1 message and assertion at the end.
> Removed dst and used buf in the new patch now.
>
> > + /*
> > + * Requested WAL isn't available in WAL buffers, so return with
> > + * what we have read so far.
> > + */
> > + break;
> >
> > nitpick: I'd move this to the top so that you can save a level of
> > indentation.
>
> Done.
>
> > + /*
> > + * All the bytes are not in one page. Read available bytes on
> > + * the current page, copy them over to output buffer and
> > + * continue to read remaining bytes.
> > + */
> >
> > Is it possible to memcpy more than a page at a time?
>
> It would complicate things a lot there; the logic to figure out the
> last page bytes that may or may not fit in the whole page gets
> complicated. Also, the logic to verify each page's header gets
> complicated. We might lose out if we memcpy all the pages at once and
> start verifying each page's header in another loop.
>
> I would like to keep it simple - read a single page from WAL buffers,
> verify it and continue.
>
> > + /*
> > + * The fact that we acquire WALBufMappingLock while reading the WAL
> > + * buffer page itself guarantees that no one else initializes it or
> > + * makes it ready for next use in AdvanceXLInsertBuffer().
> > + *
> > + * However, we perform basic page header checks for ensuring that
> > + * we are not reading a page that just got initialized. Callers
> > + * will anyway perform extensive page-level and record-level
> > + * checks.
> > + */
> >
> > Hm. I wonder if we should make these assertions instead.
>
> Okay. I added XLogReaderValidatePageHeader for assert-only builds
> which will help catch any issues there. But we can't perform record
> level checks here because this function doesn't know where the record
> starts from, it knows only pages. This change required us to pass in
> XLogReaderState to XLogReadFromBuffers. I marked it as
> PG_USED_FOR_ASSERTS_ONLY and did page header checks only when it is
> passed as non-null so that someone who doesn't have XLogReaderState
> can still read from buffers.
>
> > + elog(DEBUG1, "read %zu bytes out of %zu bytes from WAL buffers for given LSN %X/%X, Timeline ID %u",
> > + *read_bytes, count, LSN_FORMAT_ARGS(startptr), tli);
> >
> > I definitely don't think we should put an elog() in this code path.
> > Perhaps this should be guarded behind WAL_DEBUG.
>
> Placing it behind WAL_DEBUG doesn't help users/developers. My
> intention was to let users know that the WAL read hit the buffers,
> it'll help them report if any issue occurs and also help developers to
> debug that issue.
>
> On a different note - I was recently looking at the code around
> WAL_DEBUG macro and the wal_debug GUC. It looks so complex that one
> needs to build source code with the WAL_DEBUG macro and enable the GUC
> to see the extended logs for WAL. IMO, the best way there is either:
> 1) unify all the code under WAL_DEBUG macro and get rid of wal_debug GUC, or
> 2) unify all the code under wal_debug GUC (it is developer-only and
> superuser-only so there shouldn't be a problem even if we ship it out
> of the box).
>
> If someone is concerned about the GUC being enabled on production
> servers knowingly or unknowingly with option (2), we can go ahead with
> option (1). I will discuss this separately to see what others think.
>
> > I think we can simplify this. We effectively take the same action any time
> > "count" doesn't equal "read_bytes", so there's no need for the "else if".
> >
> > if (count == read_bytes)
> > return true;
> >
> > buf += read_bytes;
> > startptr += read_bytes;
> > count -= read_bytes;
>
> I wanted to avoid setting these unnecessarily for buffer misses.
>
> Thanks a lot for reviewing. I'm attaching the v8 patch for further review.
>
> --
> Bharath Rupireddy
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v8-0003-Don-t-clear-the-WAL-buffers-in-XLogBackgroundFlush.patch application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2023-03-11 19:59:37 Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL
Previous Message Justin Pryzby 2023-03-11 19:16:19 Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode