Re: pg_walfile_name_offset can return inconsistent values

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_walfile_name_offset can return inconsistent values
Date: 2023-11-09 20:49:48
Message-ID: CAEze2Wh7qWW6jbULq6mZ14fp_hNh9gM53aq4cG0b1m3DkfLU3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 9 Nov 2023 at 20:22, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I know this bug report is four years old, but it is still a
> pg_walfile_name_offset() bug. Here is the bug:
>
> SELECT *
> FROM (VALUES ('0/16ffffff'), ('0/17000000'), ('0/17000001')) AS t(lsn),
> LATERAL pg_walfile_name_offset(lsn::pg_lsn);
>
> lsn | file_name | file_offset
> ------------+--------------------------+-------------
> 0/16ffffff | 000000010000000000000016 | 16777215
> --> 0/17000000 | 000000010000000000000016 | 0
> 0/17000001 | 000000010000000000000017 | 1
>
> The bug is in the indicated line --- it shows the filename as 00016 but
> offset as zero, when clearly the LSN is pointing to 17/0. The bug is
> essentially that the code for pg_walfile_name_offset() uses the exact
> offset from the LSN, but uses the file name from the previous byte of
> the LSN.

Yes, that's definitely not a correct result.

> The fix involves deciding what the description or purpose of
> pg_walfile_name_offset() means, and adjusting it to be clearer. The
> current documentation says:
>
> Converts a write-ahead log location to a WAL file name and byte
> offset within that file.
>
> Fix #1: If we assume write-ahead log location means LSN, it is saying
> show the file/offset of the LSN, and that is most clearly:
>
> lsn | file_name | file_offset
> ------------+--------------------------+-------------
> 0/16ffffff | 000000010000000000000016 | 16777215
> 0/17000000 | 000000010000000000000017 | 0
> 0/17000001 | 000000010000000000000017 | 1
>
> Fix #2: Now, there are some who have said they want the output to be
> the last written WAL byte (the byte before the LSN), not the current
> LSN, for archiving purposes. However, if we do that, we have to update
> the docs to clarify it. Its output would be:
>
> lsn | file_name | file_offset
> ------------+--------------------------+-------------
> 0/16ffffff | 000000010000000000000016 | 16777214
> 0/17000000 | 000000010000000000000016 | 16777215
> 0/17000001 | 000000010000000000000017 | 0
>
> I have attached fix #1 as offset1.diff and fix #2 as offset2.diff.

I believe you got the references wrong; fix #1 looks like the output
of offset2's changes, and fix #2 looks like the result of offset1's
changes.

Either way, I think fix #1 is most correct (as was attached in
offset2.diff, and quoted verbatim here), because that has no chance of
having surprising underflowing behaviour when you use '0/0'::lsn as
input.

> diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
> index 45a70668b1..e65502d51e 100644
> --- a/src/backend/access/transam/xlogfuncs.c
> +++ b/src/backend/access/transam/xlogfuncs.c
> @@ -414,7 +414,7 @@ pg_walfile_name_offset(PG_FUNCTION_ARGS)
> /*
> * xlogfilename
> */
> - XLByteToPrevSeg(locationpoint, xlogsegno, wal_segment_size);
> + XLByteToSeg(locationpoint, xlogsegno, wal_segment_size);
> XLogFileName(xlogfilename, GetWALInsertionTimeLine(), xlogsegno,
> wal_segment_size);

Kind regards,

Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-09 20:58:36 Re: Improve WALRead() to suck data directly from WAL buffers when possible
Previous Message Andrew Dunstan 2023-11-09 19:44:36 Re: Failure during Building Postgres in Windows with Meson