Re: Do away with zero-padding assumption before WALRead()

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bharath(dot)rupireddyforpostgres(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Do away with zero-padding assumption before WALRead()
Date: 2024-02-16 01:40:13
Message-ID: 20240216.104013.1220145785391011056.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Tue, 13 Feb 2024 11:47:06 +0530, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote in
> Hi,
>
> I noticed an assumption [1] at WALRead() call sites expecting the
> flushed WAL page to be zero-padded after the flush LSN. I think this
> can't always be true as the WAL can get flushed after determining the
> flush LSN before reading it from the WAL file using WALRead(). I've
> hacked the code up a bit to check if that's true -

Good catch! The comment seems wrong also to me. The subsequent bytes
can be written simultaneously, and it's very normal that there are
unflushed bytes are in OS's page buffer. The objective of the comment
seems to be to declare that there's no need to clear out the remaining
bytes, here. I agree that it's not a problem for now. However, I think
we need two fixes here.

1. It's useless to copy the whole page regardless of the 'count'. It's
enough to copy only up to the 'count'. The patch looks good in this
regard.

2. Maybe we need a comment that states the page_read callback
functions leave garbage bytes beyond the returned count, due to
partial copying without clearing the unused portion.

> I'm wondering why the WALRead() callers are always reading XLOG_BLCKSZ
> despite knowing exactly how much to read. Is it to tell the OS to
> explicitly fetch the whole page from the disk? If yes, the OS will do
> that anyway because the page transfers from disk to OS page cache are
> always in terms of disk block sizes, no?

If I understand your question correctly, I guess that the whole-page
copy was expected to clear out the remaining bytes, as I mentioned
earlier.

> Although, there's no immediate problem with it right now, the
> assumption is going to be incorrect when reading WAL from WAL buffers
> using WALReadFromBuffers -
> https://www.postgresql.org/message-id/CALj2ACV=C1GZT9XQRm4iN1NV1T=hLA_hsGWNx2Y5-G+mSwdhNg@mail.gmail.com.
>
> If we have no reason, can the WALRead() callers just read how much
> they want like walsender for physical replication? Attached a patch
> for the change.
>
> Thoughts?

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-16 02:03:21 Re: make add_paths_to_append_rel aware of startup cost
Previous Message shihao zhong 2024-02-16 01:35:12 Re: Fix incorrect PG_GETARG in pgcrypto