RE: Allow logical replication to copy tables in binary format

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Melih Mutlu' <m(dot)melihmutlu(at)gmail(dot)com>
Cc: Euler Taveira <euler(at)eulerto(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Allow logical replication to copy tables in binary format
Date: 2022-10-12 01:36:04
Message-ID: TYCPR01MB8373B593010467315C2BA8EBED229@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Monday, October 3, 2022 8:50 PM Melih Mutlu <m(dot)melihmutlu(at)gmail(dot)com> wrote:
> (4) Error message of the binary format copy
>
> I've gotten below message from data type contradiction (between
> integer and bigint).
> Probably, this is unclear for the users to understand the direct cause
> and needs to be adjusted ?
> This might be a similar comment Euler mentioned in [1].
>
> 2022-09-16 11:54:54.835 UTC [4570] ERROR: insufficient data left in
> message
> 2022-09-16 11:54:54.835 UTC [4570] CONTEXT: COPY tab, line 1,
> column id
>
> It's already unclear for users to understand what's the issue if they're copying
> data between different column types via the COPY command.
> This issue comes from COPY, and logical replication just utilizes COPY.
> I don't think it would make sense to adjust an error message from a functionality
> which logical replication only uses and has no direct impact on.
> It might be better to do this in a separate patch. What do you think?
Yes, makes sense. It should be a separate patch.

> I agree with the direction to support binary copy for v16 and later.
>
> IIUC, the binary format replication with different data types fails even
> during apply phase on HEAD.
> I thought that means, the upgrade concern only applies to a scenario
> that the user executes
> only initial table synchronizations between the publisher and subscriber
> and doesn't replicate any data at apply phase after that. I would say
> this isn't a valid scenario and your proposal makes sense.
>
> No, logical replication in binary does not fail on apply phase if data types are
> different.
With HEAD, I observe in some case we fail at apply phase because of different data types like
integer vs. bigint as written scenario in [1]. In short, I think we can slightly
adjust your documentation and make it more general so that the description applies to
both table sync phase and apply phase.

I'll suggest a below change for your sentence of logical-replication.sgml.
FROM:
In binary case, it is not allowed to replicate data between different types due to restrictions inherited from COPY.
TO:
Binary format is type specific and does not allow to replicate data between different types according to its
restrictions.

If my idea above is correct, then I feel we can remove all the fixes for create_subscription.sgml.
I'm not sure if I should pursue this perspective of the document improvement
any further after this email, since this isn't essentially because of this patch.

> The concern with upgrade (if data types are not the same) would be not being
> able to create a new subscription with binary enabled or replicate new tables
> added into publication.
> Replication of tables from existing subscriptions would not be affected by this
> change since they will already be in the apply phase, not tablesync.
> Do you think this would still be an issue?
Okay, thanks for explaining this. I understand that
the upgrade concern applies to the table sync that is executed
between text format (before the patch) and binary format (after the patch).

[1] - binary format test that we fail for different types on apply phase on HEAD

<publisher>
create table tab (id integer);
insert into tab values (100);
create publication mypub for table tab;

<subscriber>
create table tab (id bigint);
create subscription mysub connection '...' publication mypub with (copy_data = false, binary = true, disable_on_error = true);

-- wait for several seconds

<subscriber>
select srsubid, srrelid, srrelid::regclass, srsubstate, srsublsn from pg_subscription_rel; -- check the status as 'r' for the relation
select * from tab; -- confirm we don't copy the initial data on the pub

<publisher>
insert into tab values (1), (2);

-- wait for several seconds

<subscriber>
select subname, subenabled from pg_subscription; -- shows 'f' for the 2nd column because of an error
select * from tab -- no records

This error doesn't happen when we adopt 'integer' on the subscriber aligned with the publisher
and we can see the two records on the subscriber.

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-10-12 01:39:49 Re: shadow variables - pg15 edition
Previous Message Michael Paquier 2022-10-12 01:18:50 Re: Allow tests to pass in OpenSSL FIPS mode