Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, David Zhang <david(dot)zhang(at)highgo(dot)ca>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Date: 2023-04-05 14:35:37
Message-ID: CAEze2Wg=N4eAiT7Nw8yC7JE5=x-1+mcAa7Q3a6tVdOmTYdH=sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Mar 2023 at 13:42, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Mon, Dec 19, 2022 at 12:37:19PM +0100, Alvaro Herrera wrote:
> > I have created one in the January commitfest,
> > https://commitfest.postgresql.org/41/
> > and rebased the patch on current master. (I have not reviewed this.)
>
> I have spent some time on that, and here are some comments with an
> updated version of the patch attached.
>
> The checks in XLogRegisterData() seemed overcomplicated to me. In
> this context, I think that we should just care about making sure that
> mainrdata_len does not overflow depending on the length given by the
> caller, which is where pg_add_u32_overflow() becomes handy.
>
> XLogRegisterBufData() added a check on UINT16_MAX in an assert, though
> we already check for overflow a couple of lines down. This is not
> necessary, it seems.
>
> @@ -535,6 +567,9 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
> XLogRecord *rechdr;
> char *scratch = hdr_scratch;
>
> + /* ensure that any assembled record can be decoded */
> + Assert(AllocSizeIsValid(DecodeXLogRecordRequiredSpace(MaxXLogRecordSize)));
>
> A hardcoded check like that has no need to be in a code path triggered
> each time a WAL record is assembled. One place where this could be is
> InitXLogInsert(). It still means that it is called one time for each
> backend, but seeing it where the initialization of xloginsert.c feels
> natural, at least. A postmaster location would be enough, as well.
>
> XLogRecordMaxSize just needs to be checked once IMO, around the end of
> XLogRecordAssemble() once we know the total size of the record that
> will be fed to a XLogReader. One thing that we should be more careful
> of is to make sure that total_len does not overflow its uint32 value
> while assembling the record, as well.
>
> I have removed XLogErrorDataLimitExceeded(), replacing it with more
> context about the errors happening. Perhaps this has no need to be
> that much verbose, but it can be really useful for developers.
>
> Some comments had no need to be updated, and there were some typos.
>
> I am on board with the idea of a XLogRecordMaxSize that's bounded at
> 1020MB, leaving 4MB as room for the extra data needed by a
> XLogReader.
>
> At the end, I think that this is quite interesting long-term. For
> example, if we lift up XLogRecordMaxSize, we can evaluate the APIs
> adding buffer data or main data separately.
>
> Thoughts about this version?

I thought that the plan was to use int64 to skip checking for most
overflows and to do a single check at the end in XLogRecordAssemble,
so that the checking has minimal overhead in the performance-critical
log record assembling path and reduced load on the branch predictor.

One more issue that Andres was suggesting we'd fix was to allow XLog
assembly separate from the actual XLog insertion:
Currently you can't pre-assemble a record outside a critical section
if the record must be inserted in a critical section, which makes e.g.
commit records problematic due to the potentially oversized data
resulting in ERRORs during record assembly. This would crash postgres
because commit xlog insertion happens in a critical section. Having a
pre-assembled record would greatly improve the ergonomics in that path
and reduce the length of the critical path.

I think it was something along the lines of the attached; 0001
contains separated Commit/Abort record construction and insertion like
Andres suggested, 0002 does the size checks with updated error
messages.

Kind regards,

Matthias van de Meent

Attachment Content-Type Size
v11-0001-Create-a-path-for-separate-xlog-record-construct.patch application/octet-stream 36.6 KB
v11-0002-Add-protections-in-xlog-record-APIs-against-over.patch application/octet-stream 5.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-04-05 14:35:59 Re: [BUG] Logical replica crash if there was an error in a function.
Previous Message Imseih (AWS), Sami 2023-04-05 14:31:54 Re: Add index scan progress to pg_stat_progress_vacuum