Re: Logical Replication and Character encoding

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: craig(at)2ndquadrant(dot)com
Cc: noriyoshi(dot)shinoda(at)hpe(dot)com, pgsql-hackers(at)postgresql(dot)org, euler(at)timbira(dot)com(dot)br
Subject: Re: Logical Replication and Character encoding
Date: 2017-02-03 08:34:48
Message-ID: 20170203.173448.96400166.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 3 Feb 2017 13:47:54 +0800, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote in <CAMsr+YFqNLAvdjmgWOMWM9X=FfzcFOL4pLxBeaqtjySbe_UeTg(at)mail(dot)gmail(dot)com>
> On 3 Feb. 2017 15:47, "Kyotaro HORIGUCHI" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> wrote:
>
> Hello,
>
> At Fri, 3 Feb 2017 09:16:47 +0800, Craig Ringer <craig(at)2ndquadrant(dot)com>
> wrote in <CAMsr+YGqn2PjJBCY+RjWWTJ4BZ=FhG0REXq1E1nd8_kjaDGoEQ(at)mail(dot)gmail(dot)com
> >
> > On 2 February 2017 at 11:45, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
> >
> > > I don't think storage without conversion is an acceptable approach. We
> > > should provide options to users such as ignore tuple or NULL for
> > > column(s) with conversion problem. I wouldn't consider storage data
> > > without conversion because it silently show incorrect data and we
> > > historically aren't flexible with conversion routines.
>
> It is possible technically. But changing the behavior of a
> subscript and/or publication requires change of SQL syntax. It
> seems a bit too late for proposing such a new feature..
>
> IMHO unintentional silent data loss must not be happen so the
> default behavior on conversion failure cannot be other than stop
> of replication.
>
>
> Agree. Which is why we should default to disallowing mismatched upstream
> and downstream encodings. At least to start with.
>
> > pglogical and BDR both require identical encoding; they test for this
> > during startup and refuse to replicate if the encoding differs.
> >
> > For the first pass at core I suggest a variant on that policy: require
> > source and destination encoding to be the same. This should probably
> > be the first change, since it definitively prevents the issue from
> > arising.
>
> If the check is performed by BDR, pglogical itself seems to be
> allowed to convert strings when the both end have different
> encodings in a non-BDR environment. Is this right?
>
>
> Hm. Maybe it's changed since I last looked. We started off disallowing
> mismatched encodings anyway.
>
> Note that I'm referring to pglogical, the tool, not "in core logical
> replication for postgres"

Ouch! I'm so sorry for my bad mistake. What I thought that I
mentioned is not pglogical, but pgoutput, the default output
plugin. But what the patch modifies is logical/proto.c. My
correct question was the following.

| Does pglogical requires that the core to reject connections
| with a server with unidentical encoging? Or check condings by
| itself and disconnect by itself?

> > If time permits we could also allow destination encoding to be UTF-8
> > (including codepage 65001) with any source encoding. This requires
> > encoding conversion to be performed, of course.
>
> Does this mean that BDR might work on heterogeneous encoding
> environemnt?
>
> No. Here the "we" was meant to be PG core for V10 or later.
>
> But anyway some encodings (like SJIS) have
> caharacters with the same destination in its mapping so BDR
> doesn't seem to work with such conversions. So each encoding
> might should have a property to inform its usability under BDR
> environment, but.
>
>
> PG doesn't allow SJIS as a db encoding. So it doesn't matter here.

Oops. I suppose EUC_JP also has such characters but I'm not sure
now.

> > The downside is that this will impact users who use a common subset of
> > two encodings. This is most common for Windows-1252 <-> ISO-8859-15
> > (or -1 if you're old-school) but also arises anywhere the common 7 bit
> > subset is used. Until we can define an encoding exception policy
> > though, I think we should defer supporting those and make them a
> > "later" problem.
>
> If the conversion is rejected for now, we should check the
> encoding identity instead.

Ok, I'll go on this direction for the next patch.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vinayak 2017-02-03 09:12:01 Postgres_fdw behaves oddly
Previous Message Kyotaro HORIGUCHI 2017-02-03 08:17:12 Re: IF (NOT) EXISTS in psql-completion