| From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
| Date: | 2025-06-11 22:03:04 |
| Message-ID: | CAD21AoDCphZ30iiMOG8mZH_eo40m_1KY=4gz7-Yd+9M3FRBySg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Tue, Jun 10, 2025 at 12:19 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Sun, Jun 8, 2025 at 6:43 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Sat, 7 Jun 2025 at 12:47, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Fri, 6 Jun 2025 at 22:51, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > On Thu, Jun 5, 2025 at 8:43 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > >
> > > > > On Thu, Jun 5, 2025 at 11:14 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > > >
> > > > > > On Wed, Jun 4, 2025 at 11:36 PM Hayato Kuroda (Fujitsu)
> > > > > > <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > > > > > >
> > > > > > > Dear Sawada-san,
> > > > > > >
> > > > > > > > Alternative idea is to lower the constant value when using an
> > > > > > > > assertion build. That way, we don't need to rely on injection points
> > > > > > > > being enabled.
> > > > > > >
> > > > > > > Hmm, possible but I prefer current one. Two concerns:
> > > > > > >
> > > > > > > 1.
> > > > > > > USE_ASSERT_CHECKING has not been used to change the value yet. The main usage is
> > > > > > > to call debug functions in debug build.
> > > > > >
> > > > > > I think we have a similar precedent such as MT_NRELS_HASH to improve
> > > > > > the test coverages.
> > > > > >
> > > > > > > 2.
> > > > > > > If we add tests which is usable only for debug build, it must be run only when it
> > > > > > > is enabled. IIUC such test does not exist yet.
> > > > > >
> > > > > > I think we need to test cases not to check if we reach a specific code
> > > > > > point but to check if we can get the correct results even if we've
> > > > > > executed various code paths. As for this bug, it is better to check
> > > > > > that it works properly in a variety of cases. That way, we can check
> > > > > > overflow cases and non-overflow cases also in test cases added in the
> > > > > > future, improving the test coverage more.
> > > > > >
> > > > >
> > > > > This makes sense, but we should see whether some existing tests cover
> > > > > this code path after lowering the limit in the overflow code path. One
> > > > > minor point to consider is that at the time, the value MT_NRELS_HASH
> > > > > was used to cover cases in a debug build, but we didn't have the
> > > > > injection_point framework.
> > > >
> > > > True.
> > > >
> > > > After thinking about it more, perhaps my proposal would not be a good
> > > > idea for this case. I think that the cases where we selectively
> > > > invalidate caches is more complex and error-prone than the cases where
> > > > we invalidate a complete cache. If we invalidated all caches after
> > > > decoding each transaction, we wouldn't have had the original data-loss
> > > > issue. Having a lower MAX_DISTR_INVAL_MSG_PER_TXN value when using an
> > > > assertio build means that we're going to test the cases using a
> > > > simpler invalidation mechanism while productions systems, which has a
> > > > higher MAX_DISTR_INVAL_MSG_PER_TXN value, would end up executing
> > > > complex cases, which is not great. What do you think?
> > > >
> > > > BTW, as for a new test case, it might be worth having a case I
> > > > mentioned before[1]:
> > > >
> > > > 1) S1: CREATE TABLE d (data text not null);
> > > > 2) S1: INSERT INTO d VALUES ('d1');
> > > > 3) S2: BEGIN; INSERT INTO d VALUES ('d2');
> > > > 4) S3: BEGIN; INSERT INTO d VALUES ('d3');
> > > > 5) S1: ALTER PUBLICATION pb ADD TABLE d;
> > > > 6) S2: INSERT INTO d VALUES ('d4');
> > > > 7) S2: COMMIT;
> > > > 8) S3: COMMIT;
> > > > 9) S2: INSERT INTO d VALUES('d5');
> > > > 10) S1: INSERT INTO d VALUES ('d6');
> > > >
> > > > With this case, we can test if we need to execute the distributed
> > > > invalidations as well in the non-error path in
> > > > ReorderBufferProcessTXN().
> > >
> > > The attached v13 version patch has the change to include this test case.
> >
> > Attached are the patches, including those required for the back branches.
>
> Thank you for updating the patches! I'll review them and share comments if any.
Thank you for updating the patch. I have one comment on the newly added test:
+session "s3"
+step "s3i1" { INSERT INTO tbl1 (val1, val2) VALUES (1, 1);}
+step "s3a" { ALTER PUBLICATION pub ADD TABLE tbl1; }
+step "s3i2" { INSERT INTO tbl1 (val1, val2) VALUES (6, 6); }
+step "s3_get_binary_changes" { SELECT count(data) FROM
pg_logical_slot_get_binary_changes('isolation_slot', NULL, NULL,
'proto_version', '4', 'publication_names', 'pub') WHERE get_byte(data,
0) = 73; }
+
+session "s4"
+step "s4b" { BEGIN; }
+step "s4i1" { INSERT INTO tbl1 (val1, val2) VALUES (2, 2);}
+step "s4i2" { INSERT INTO tbl1 (val1, val2) VALUES (4, 4); }
+step "s4c" { COMMIT; }
+step "s4i3" { INSERT INTO tbl1 (val1, val2) VALUES (5, 5); }
+
+session "s5"
+step "s5b" { BEGIN; }
+step "s5i1" { INSERT INTO tbl1 (val1, val2) VALUES (3, 3); }
+step "s5c" { COMMIT; }
I think we don't necessarily need to add sessions "s4" and "s5". Let's
reuse "s1" and "s2" instead of adding them. I've attached a patch to
change that.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
| Attachment | Content-Type | Size |
|---|---|---|
| v13_test_change.patch | application/octet-stream | 4.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2025-06-11 22:34:52 | Re: BUG #18944: Assertion Failure in psql with idle_session_timeout Set |
| Previous Message | Nathan Bossart | 2025-06-11 21:42:19 | Re: Invalid control file checksum with AVX-512 during initdb on a clang19 -O0 build |