Re: Unaccent extension python script Issue in Windows

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: hugh(at)whtc(dot)ca
Cc: raam(dot)soft(at)gmail(dot)com, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com
Subject: Re: Unaccent extension python script Issue in Windows
Date: 2019-03-18 06:27:55
Message-ID: 20190318.152755.73288474.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

At Mon, 18 Mar 2019 14:13:34 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20190318(dot)141334(dot)186469242(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello.
>
> At Sun, 17 Mar 2019 20:23:05 -0400, Hugh Ranalli <hugh(at)whtc(dot)ca> wrote in <CAAhbUMNoBLu7jAbyK5MK0LXEyt03PzNQt_Apkg0z9bsAjcLV4g(at)mail(dot)gmail(dot)com>
> > Hi Ram,
> > Thanks for doing this; I've been overestimating my ability to get to things
> > over the last couple of weeks.
> >
> > I've looked at the patch and have made one minor change. I had moved all
> > the imports up to the top, to keep them in one place (and I think some had
> > originally been used only by the Python 2 code. You added them there, but
> > didn't remove them from their original positions. So I've incorporated that
> > into your patch, attached as v2. I've tested this under Python 2 and 3 on
> > Linux, not Windows.
>
> Though I'm not sure the necessity of running the script on
> Windows, the problem is not specific for Windows, but general one
> that haven't accidentially found on non-Windows environment.
>
> On CentOS7:
> > export LANG="ja_JP.EUCJP"
> > python <..snipped..>
> ..
> > UnicodeEncodeError: 'euc_jp' codec can't encode character '\xab' in position 0: illegal multibyte sequence
>
> So this is not an issue with Windows but with python3.
>
> The script generates identical files with the both versions of
> python with the pach on Linux and Windows 7. Python3 on Windows
> emits CRLF as a new line but it doesn't seem to harm. (I didn't
> confirmed that due to extreme slowness of build from uncertain
> reasons now..)

I confirmed that CRLF actually doesn't harm and unaccent works
correctly. (t_isspace() excludes them as white space).

> This patch contains irrelevant changes. The minimal required
> change would be the attached. If you want refacotor the
> UnicodeData reader or rearrange import sutff, it should be
> separate patches.
>
> It would be better use IOBase for Python3 especially for stdout
> replacement but I didin't since it *is* working.
>
> > Everything else looks correct. I apologise for not having replied to your
> > question in the original bug report. I had intended to, but as I said,
> > there's been an increase in the things I need to juggle at the moment.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2019-03-18 06:32:20 Re: Data-only pg_rewind, take 2
Previous Message Stephen Frost 2019-03-18 06:25:31 Re: Google Summer of Code