Re: Logical Replication and Character encoding

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: noriyoshi(dot)shinoda(at)hpe(dot)com, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, euler(at)timbira(dot)com(dot)br
Subject: Re: Logical Replication and Character encoding
Date: 2017-02-03 05:47:54
Message-ID: CAMsr+YFqNLAvdjmgWOMWM9X=FfzcFOL4pLxBeaqtjySbe_UeTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3 Feb. 2017 15:47, "Kyotaro HORIGUCHI" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

Hello,

At Fri, 3 Feb 2017 09:16:47 +0800, Craig Ringer <craig(at)2ndquadrant(dot)com>
wrote in <CAMsr+YGqn2PjJBCY+RjWWTJ4BZ=FhG0REXq1E1nd8_kjaDGoEQ(at)mail(dot)gmail(dot)com
>
> On 2 February 2017 at 11:45, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
>
> > I don't think storage without conversion is an acceptable approach. We
> > should provide options to users such as ignore tuple or NULL for
> > column(s) with conversion problem. I wouldn't consider storage data
> > without conversion because it silently show incorrect data and we
> > historically aren't flexible with conversion routines.

It is possible technically. But changing the behavior of a
subscript and/or publication requires change of SQL syntax. It
seems a bit too late for proposing such a new feature..

IMHO unintentional silent data loss must not be happen so the
default behavior on conversion failure cannot be other than stop
of replication.

Agree. Which is why we should default to disallowing mismatched upstream
and downstream encodings. At least to start with.

> pglogical and BDR both require identical encoding; they test for this
> during startup and refuse to replicate if the encoding differs.
>
> For the first pass at core I suggest a variant on that policy: require
> source and destination encoding to be the same. This should probably
> be the first change, since it definitively prevents the issue from
> arising.

If the check is performed by BDR, pglogical itself seems to be
allowed to convert strings when the both end have different
encodings in a non-BDR environment. Is this right?

Hm. Maybe it's changed since I last looked. We started off disallowing
mismatched encodings anyway.

Note that I'm referring to pglogical, the tool, not "in core logical
replication for postgres"

> If time permits we could also allow destination encoding to be UTF-8
> (including codepage 65001) with any source encoding. This requires
> encoding conversion to be performed, of course.

Does this mean that BDR might work on heterogeneous encoding
environemnt?

No. Here the "we" was meant to be PG core for V10 or later.

But anyway some encodings (like SJIS) have
caharacters with the same destination in its mapping so BDR
doesn't seem to work with such conversions. So each encoding
might should have a property to inform its usability under BDR
environment, but.

PG doesn't allow SJIS as a db encoding. So it doesn't matter here.

> The downside is that this will impact users who use a common subset of
> two encodings. This is most common for Windows-1252 <-> ISO-8859-15
> (or -1 if you're old-school) but also arises anywhere the common 7 bit
> subset is used. Until we can define an encoding exception policy
> though, I think we should defer supporting those and make them a
> "later" problem.

If the conversion is rejected for now, we should check the
encoding identity instead.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2017-02-03 05:56:18 Re: proposal: session server side variables
Previous Message Kyotaro HORIGUCHI 2017-02-03 04:18:26 Re: Radix tree for character conversion