Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Non-replayable WAL records through overflows and >MaxAllocSize lengths
Date: 2022-03-14 16:57:23
Message-ID: CAEze2Wj3=PwJzo8ux_4W_eEpMvd7ZMYSBQO4r7iJT01geCabaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you all for the feedback. Please find attached v2 of the
patchset, which contains updated comments and applies the suggested
changes.

On Sat, 12 Mar 2022 at 02:03, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2022-03-11 22:42:42 +0200, Heikki Linnakangas wrote:
> > Have you been able to create a test case for that? The largest record I can
> > think of is a commit record with a huge number of subtransactions, dropped
> > relations, and shared inval messages. I'm not sure if you can overflow a
> > uint32 with that, but exceeding MaxAllocSize seems possible.
>
> MaxAllocSize is pretty easy:
> SELECT pg_logical_emit_message(false, long, long) FROM repeat(repeat(' ', 1024), 1024*1023) as l(long);
>
> on a standby:
>
> 2022-03-11 16:41:59.336 PST [3639744][startup][1/0:0] LOG: record length 2145386550 at 0/3000060 too long

Thanks for the reference. I was already playing around with 2PC log
records (which can theoretically contain >4GB of data); but your
example is much easier and takes significantly less time.

I'm not sure whether or not to include this in the test suite, though,
as this would require a machine with at least 1GB of memory available
for this test alone, and I don't know the current requirements for
running the test suite.

> > I wonder if these checks hurt performance. These are very cheap, but then
> > again, this codepath is very hot. It's probably fine, but it still worries
> > me a little. Maybe some of these could be Asserts.
>
> I wouldn't expect the added branch itself to hurt much in XLogRegisterData() -
> it should be statically predicted to be not taken with the unlikely. I don't
> think it's quite inner-loop enough for the instructions or the number of
> "concurrently out of order branches" to be a problem.
>
> FWIW, often the added elog()s are worse, because they require a decent amount
> of code and restrict the optimizer somewhat (e.g. no sibling calls, more local
> variables etc). They can't even be deduplicated because of the line-numbers
> embedded.
>
> So maybe just collapse the new elog() with the previous elog, with a common
> unlikely()?

Updated.

> > > @@ -734,6 +744,10 @@ XLogRecordAssemble(RmgrId rmid, uint8 info,
> > > if (needs_data)
> > > {
> > > + /* protect against overflow */
> > > + if (unlikely(regbuf->rdata_len > UINT16_MAX))
> > > + elog(ERROR, "too much WAL data for registered buffer");
> > > +
> > > /*
> > > * Link the caller-supplied rdata chain for this buffer to the
> > > * overall list.
>
> FWIW, this branch I'm a tad more concerned about - it's in a loop body where
> plausibly a lot of branches could be outstanding at the same time.
>
> ISTM that this could just be an assert?

This specific location has been replaced with an Assert, while
XLogRegisterBufData always does the unlikely()-ed bounds check.

Kind regards,

Matthias

Attachment Content-Type Size
v2-0001-Add-protections-in-xlog-record-APIs-against-large.patch application/octet-stream 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-14 17:00:23 Re: Allow async standbys wait for sync replication
Previous Message Robert Haas 2022-03-14 16:55:02 Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints