Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-06-12 06:47:44
Message-ID: CALDaNm3m6Dq2d6bCBaOjnABYavT3p+=VbJd5oJKok0sCTOSZog@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, 12 Jun 2025 at 03:33, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 10, 2025 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > > >
> > > > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > > > >
> > > > > > > > Dear Sawada-san,
> > > > > > > >
> > > > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > > > being enabled.
> > > > > > > >
> > > > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > > > >
> > > > > > > > 1.
> > > > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > > > to call debug functions in debug build.
> > > > > > >
> > > > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > > > the test coverages.
> > > > > > >
> > > > > > > > 2.
> > > > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > > > is enabled. IIUC such test does not exist yet.
> > > > > > >
> > > > > > > I think we need to test cases not to check if we reach a specific code
> > > > > > > point but to check if we can get the correct results even if we've
> > > > > > > executed various code paths. As for this bug, it is better to check
> > > > > > > that it works properly in a variety of cases. That way, we can check
> > > > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > > > future, improving the test coverage more.
> > > > > > >
> > > > > >
> > > > > > This makes sense, but we should see whether some existing tests cover
> > > > > > this code path after lowering the limit in the overflow code path. One
> > > > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > > > was used to cover cases in a debug build, but we didn't have the
> > > > > > injection_point framework.
> > > > >
> > > > > True.
> > > > >
> > > > > After thinking about it more, perhaps my proposal would not be a good
> > > > > idea for this case. I think that the cases where we selectively
> > > > > invalidate caches is more complex and error-prone than the cases where
> > > > > we invalidate a complete cache. If we invalidated all caches after
> > > > > decoding each transaction, we wouldn't have had the original data-loss
> > > > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > > > assertio build means that we're going to test the cases using a
> > > > > simpler invalidation mechanism while productions systems, which has a
> > > > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > > > complex cases, which is not great. What do you think?
> > > > >
> > > > > BTW, as for a new test case, it might be worth having a case I
> > > > > mentioned before[1]:
> > > > >
> > > > > 1) S1: CREATE TABLE d (data text not null);
> > > > > 2) S1: INSERT INTO d VALUES ('d1');
> > > > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > > > 6) S2: INSERT INTO d VALUES ('d4');
> > > > > 7) S2: COMMIT;
> > > > > 8) S3: COMMIT;
> > > > > 9) S2: INSERT INTO d VALUES('d5');
> > > > > 10) S1: INSERT INTO d VALUES ('d6');
> > > > >
> > > > > With this case, we can test if we need to execute the distributed
> > > > > invalidations as well in the non-error path in
> > > > > ReorderBufferProcessTXN().
> > > >
> > > > The attached v13 version patch has the change to include this test case.
> > >
> > > Attached are the patches, including those required for the back branches.
> >
> > Thank you for updating the patches! I'll review them and share comments if any.
>
> Thank you for updating the patch. I have one comment on the newly added test:
>
> +session "s3"
> +step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
> +step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
> +step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
> +step "s3_get_binary_changes" { SELECT count(data) FROM
> pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
> 'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
> 0) = 73; }
> +
> +session "s4"
> +step "s4b" { BEGIN; }
> +step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
> +step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
> +step "s4c" { COMMIT; }
> +step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
> +
> +session "s5"
> +step "s5b" { BEGIN; }
> +step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
> +step "s5c" { COMMIT; }
>
> I think we don't necessarily need to add sessions "s4" and "s5". Let's
> reuse "s1" and "s2" instead of adding them. I've attached a patch to
> change that.

Thanks, this is better.
In the case of PG13 I have slightly changed the test to do the insert
after the commit, because the queuing of invalidation messages into
reorder buffer queue is not supported in PG13. This limitation is due
to the absence of support for REORDER_BUFFER_CHANGE_INVALIDATION which
is already present in >= PG14
The attached v14 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v14-PG14-0001-Fix-exponential-memory-allocation-issue-in-.patch text/x-patch 17.3 KB
v14-PG13-0001-Fix-exponential-memory-allocation-issue-in-.patch text/x-patch 16.4 KB
v14-PG15-0001-Fix-exponential-memory-allocation-issue-in-.patch text/x-patch 17.3 KB
v14-PG16-0001-Fix-exponential-memory-allocation-issue-in-.patch text/x-patch 17.4 KB
v14-PG17-0001-Fix-exponential-memory-allocation-issue-in-.patch text/x-patch 17.4 KB
v14-master-0001-Fix-exponential-memory-allocation-issue-i.patch text/x-patch 16.7 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message John Naylor 2025-06-12 06:57:01 Re: Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build
Previous Message Michael Paquier 2025-06-11 23:46:26 Re: BUG #18947: TRAP: failed Assert("len_to_wrt >= 0") in pg_stat_statements