Re: Logical Replication and Character encoding

From: "Shinoda, Noriyoshi" <noriyoshi(dot)shinoda(at)hpe(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Logical Replication and Character encoding
Date: 2017-02-01 08:39:41
Message-ID: AT5PR84MB0084A18D3BF1D93B862E95E4EE4D0@AT5PR84MB0084.NAMPRD84.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for creating patches.
I strongly hope that your patch will be merged into the new version.
Since all databases are not yet based on UTF - 8, I think conversion of character codes is still necessary.

-----Original Message-----
From: Kyotaro HORIGUCHI [mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp]
Sent: Wednesday, February 01, 2017 3:31 PM
To: Shinoda, Noriyoshi <noriyoshi(dot)shinoda(at)hpe(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Logical Replication and Character encoding

At Wed, 01 Feb 2017 12:13:04 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170201(dot)121304(dot)267734380(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > I tried a committed Logical Replication environment. I found
> > > that replication between databases of different encodings did not
> > > convert encodings in character type columns. Is this behavior
> > > correct?
> >
> > The output plugin for subscription is pgoutput and it currently
> > doesn't consider encoding but would easiliy be added if desired
> > encoding is informed.
> >
> > The easiest (but somewhat seems fragile) way I can guess is,
> >
> > - Subscriber connects with client_encoding specification and the
> > output plugin pgoutput decide whether it accepts the encoding
> > or not. If the subscriber doesn't, pgoutput send data without
> > conversion.
> >
> > The attached small patch does this and works with the following
> > CREATE SUBSCRIPTION.
>
> Oops. It forgets to care conversion failure. It is amended in the
> attached patch.
>
> > CREATE SUBSCRIPTION sub1 CONNECTION 'host=/tmp port=5432
> > dbname=postgres client_encoding=EUC_JP' PUBLICATION pub1;
> >
> >
> > Also we may have explicit negotiation on, for example,
> > CREATE_REPLICATION_SLOT.
> >
> > 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput ENCODING EUC_JP'
> >
> > Or output plugin may take options.
> >
> > 'CREATE_REPLICATION_SLOT sub1 LOGICAL pgoutput OPTIONS(encoding EUC_JP)'
> >
> >
> > Any opinions?

This patch chokes replication when the publisher finds an inconvertible character in a tuple to be sent. For the case, dropping-then-recreating subscription is necessary to go forward.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-02-01 08:42:54 Re: IF (NOT) EXISTS in psql-completion
Previous Message Kyotaro HORIGUCHI 2017-02-01 08:37:48 Re: IF (NOT) EXISTS in psql-completion