Re: Logical Replication and Character encoding

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: craig(at)2ndquadrant(dot)com
Cc: euler(at)timbira(dot)com(dot)br, noriyoshi(dot)shinoda(at)hpe(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Logical Replication and Character encoding
Date: 2017-02-03 02:47:16
Message-ID: 20170203.114716.219596402.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

At Fri, 3 Feb 2017 09:16:47 +0800, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote in <CAMsr+YGqn2PjJBCY+RjWWTJ4BZ=FhG0REXq1E1nd8_kjaDGoEQ(at)mail(dot)gmail(dot)com>
> On 2 February 2017 at 11:45, Euler Taveira <euler(at)timbira(dot)com(dot)br> wrote:
>
> > I don't think storage without conversion is an acceptable approach. We
> > should provide options to users such as ignore tuple or NULL for
> > column(s) with conversion problem. I wouldn't consider storage data
> > without conversion because it silently show incorrect data and we
> > historically aren't flexible with conversion routines.

It is possible technically. But changing the behavior of a
subscript and/or publication requires change of SQL syntax. It
seems a bit too late for proposing such a new feature..

IMHO unintentional silent data loss must not be happen so the
default behavior on conversion failure cannot be other than stop
of replication.

> pglogical and BDR both require identical encoding; they test for this
> during startup and refuse to replicate if the encoding differs.
>
> For the first pass at core I suggest a variant on that policy: require
> source and destination encoding to be the same. This should probably
> be the first change, since it definitively prevents the issue from
> arising.

If the check is performed by BDR, pglogical itself seems to be
allowed to convert strings when the both end have different
encodings in a non-BDR environment. Is this right?

> If time permits we could also allow destination encoding to be UTF-8
> (including codepage 65001) with any source encoding. This requires
> encoding conversion to be performed, of course.

Does this mean that BDR might work on heterogeneous encoding
environemnt? But anyway some encodings (like SJIS) have
caharacters with the same destination in its mapping so BDR
doesn't seem to work with such conversions. So each encoding
might should have a property to inform its usability under BDR
environment, but...

On the other hand, no prolem is seen in encoding conversions in
non-BDR environments. (except the behavior on failure)

> The downside is that this will impact users who use a common subset of
> two encodings. This is most common for Windows-1252 <-> ISO-8859-15
> (or -1 if you're old-school) but also arises anywhere the common 7 bit
> subset is used. Until we can define an encoding exception policy
> though, I think we should defer supporting those and make them a
> "later" problem.

If the conversion is rejected for now, we should check the
encoding identity instead.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2017-02-03 03:16:49 Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Previous Message Amit Langote 2017-02-03 02:31:26 Re: Declarative partitioning - another take