Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: David Zhang <david(dot)zhang(at)highgo(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Date: 2022-06-20 09:01:51
Message-ID: CAEze2WiA6+mze9KH8dw552fxWB9Eg=Wtayxhr4y6Wn3PXS-9mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 20 Jun 2022 at 07:02, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sat, Jun 11, 2022 at 09:25:48PM +0200, Matthias van de Meent wrote:
> > Did you initiate a new cluster or otherwise skip the invalid record
> > you generated when running the instance based on master? It seems to
> > me you're trying to replay the invalid record (len > MaxAllocSize),
> > and this patch does not try to fix that issue. This patch just tries
> > to forbid emitting records larger than MaxAllocSize, as per the check
> > in XLogRecordAssemble, so that we wont emit unreadable records into
> > the WAL anymore.
> >
> > Reading unreadable records still won't be possible, but that's also
> > not something I'm trying to fix.
>
> As long as you cannot generate such WAL records that should be fine as
> wAL is not reused across upgrades, so this kind of restriction is a
> no-brainer on HEAD. The back-patching argument is not on the table
> anyway, as some of the routine signatures change with the unsigned
> arguments, because of those safety checks.

The signature change is mostly ornamental, see attached v4.backpatch.
The main reason for changing the signature is to make sure nobody can
provide a negative value, but it's not important to the patch.

>
> + if (unlikely(num_rdatas >= max_rdatas) ||
> + unlikely(!AllocSizeIsValid((uint64) mainrdata_len + (uint64) len)))
> + XLogErrorDataLimitExceeded();
> [...]
> +inline void
> +XLogErrorDataLimitExceeded()
> +{
> + elog(ERROR, "too much WAL data");
> +}
> The three checks are different, OK..

They each check slightly different things, but with the same error. In
RegisterData, it checks that the data can still be allocated and does
not overflow the register, in RegisterBlock it checks that the total
length of data registered to the block does not exceed the max value
of XLogRecordBlockHeader->data_length. I've updated the comments above
the checks so that this distinction is more clear.

> Note that static is missing.

Fixed in attached v4.patch

> + if (unlikely(!AllocSizeIsValid(total_len)))
> + XLogErrorDataLimitExceeded();
> Rather than a single check at the end of XLogRecordAssemble(), you'd
> better look after that each time total_len is added up?

I was doing so previously, but there were some good arguments against that:

- Performance of XLogRecordAssemble should be impacted as little as
possible. XLogRecordAssemble is in many hot paths, and it is highly
unlikely this check will be hit, because nobody else has previously
reported this issue. Any check, however unlikely, will add some
overhead, so removing check counts reduces overhead of this patch.

- The user or system is unlikely to care about which specific check
was hit, and only needs to care _that_ the check was hit. An attached
debugger will be able to debug the internals of the xlog machinery and
find out the specific reasons for the error, but I see no specific
reason why the specific reason would need to be reported to the
connection.

Kind regards,

Matthias van de Meent

Attachment Content-Type Size
v4-0001-Add-protections-in-xlog-record-APIs-against-large.patch application/octet-stream 6.8 KB
v4-0001-Add-protections-in-xlog-record-APIs-against-large.backpatch application/octet-stream 5.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2022-06-20 09:07:32 Re: Handle infinite recursion in logical replication setup
Previous Message Przemysław Sztoch 2022-06-20 08:37:57 Re: [PATCH] Completed unaccent dictionary with many missing characters