Re: Error "initial slot snapshot too large" in create replication slot

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: dilipbalaut(at)gmail(dot)com
Cc: rjuju123(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Error "initial slot snapshot too large" in create replication slot
Date: 2022-01-31 06:20:11
Message-ID: 20220131.152011.148738317202375552.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 17 Jan 2022 09:27:14 +0530, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote in
> On Wed, Jan 12, 2022 at 4:09 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > The cfbot reports that this patch doesn't compile:
> > https://cirrus-ci.com/task/5642000073490432?logs=build
> >
> > [03:01:24.477] snapbuild.c: In function ‘SnapBuildInitialSnapshot’:
> > [03:01:24.477] snapbuild.c:587:2: error: ‘newsubxcnt’ undeclared (first
> > use in this function); did you mean ‘newsubxip’?
> > [03:01:24.477] 587 | newsubxcnt = 0;
> > [03:01:24.477] | ^~~~~~~~~~
> > [03:01:24.477] | newsubxip
> > [03:01:24.477] snapbuild.c:587:2: note: each undeclared identifier is
> > reported only once for each function it appears in
> > [03:01:24.477] snapbuild.c:535:8: warning: unused variable ‘maxxidcnt’
> > [-Wunused-variable]
> > [03:01:24.477] 535 | int maxxidcnt;
> > [03:01:24.477] | ^~~~~~~~~
> >
> > Could you send a new version? In the meantime I will switch the patch to
> > Waiting on Author.
> >
>
> Thanks for notifying, I will work on this and send the update patch soon.

me> Mmm. The size of the array cannot be larger than the numbers the
me> *Connt() functions return. Thus we cannot attach the oversized array
me> to ->subxip. (I don't recall clearly but that would lead to assertion
me> failure somewhere..)

Then, I fixed the v3 error and post v4.

To recap:

SnapBUildInitialSnapshot tries to store XIDS of both top and sub
transactions into snapshot->xip array but the array is easily
overflowed and CREATE_REPLICATOIN_SLOT command ends with an error.

To fix this, this patch is doing the following things.

- Use subxip array instead of xip array to allow us have larger array
for xids. So the snapshot is marked as takenDuringRecovery, which
is a kind of abuse but largely reduces the chance of getting
"initial slot snapshot too large" error.

- Still if subxip is overflowed, retry with excluding subtransactions
then set suboverflowed. This causes XidInMVCCSnapshot (finally)
scans over subxip array for targetted top-level xid.

We could take another way: make a !takenDuringRecovery snapshot by
using xip instead of subxip. It is cleaner but it has far larger
chance of needing to retry.

(renamed the patch since it represented a part of the patch)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v4-0001-Avoid-an-error-while-creating-logical-replication.patch text/x-patch 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-31 06:54:01 Re: Add header support to text format and matching feature
Previous Message Julien Rouhaud 2022-01-31 05:54:16 Re: Is there a way (except from server logs) to know the kind of on-going/last checkpoint?