RE: logrep stuck with 'ERROR: int2vector has too many elements'

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: logrep stuck with 'ERROR: int2vector has too many elements'
Date: 2023-01-15 14:46:42
Message-ID: OS0PR01MB5716EA3B7E6DE060773B1A5594C09@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sunday, January 15, 2023 5:35 PM Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
> I can't find the exact circumstances that cause it but it has something to do with
> many columns (or adding many columns) in combination with perhaps
> generated columns.
>
> This replication test, in a slightly different form, used to work. This is also
> suggested by the fact that the attached runs without errors in REL_15_STABLE but
> gets stuck in HEAD.
>
> What it does: it initdbs and runs two instances, primary and replica. In the
> primary 'pgbench -is1' done, and many columns, including 1 generated column,
> are added to all 4 pgbench tables. This is then pg_dump/pg_restored to the
> replica, and a short pgbench is run. The result tables on primary and replica are
> compared for the final result.
> (To run it will need some tweaks to directory and connection parms)
>
> I ran it on both v15 and v16 for 25 runs: with the parameters as given
> 15 has no problem while 16 always got stuck with the int2vector error.
> (15 can actually be pushed up to the max of 1600 columns per table without
> errors)
>
> Both REL_15_STABLE and 16devel built from recent master on Debian 10, gcc
> 12.2.0.
>
> I hope someone understands what's going wrong.

Thanks for reporting.

I think the basic problem is that we try to fetch the column list as a int2vector
when doing table sync, and then if the number of columns is larger than 100, we
will get an ERROR like the $subject.

We can also hit this ERROR by manually specifying a long(>100) column
list in the publication Like:

create publication pub for table test(a1,a2,a3... a200);
create subscription xxx.

The script didn't reproduce this in PG15, because we didn't filter out
generated column when fetching the column list, so it assumes all columns are
replicated and will return NULL for the column list(int2vector) value. But in
PG16 (b7ae039), we started to filter out generated column(because generated columns are
not replicated in logical replication), so we get a valid int2vector and get
the ERROR.
I will think and work on a fix for this.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhang Mingli 2023-01-15 15:43:49 Code review in dsa.c
Previous Message Dmitry Dolgov 2023-01-15 13:57:37 Re: [RFC] Add jit deform_counter