Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, badalex <badalex(at)gmail(dot)com>, cb <cb(at)df7cb(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases
Date: 2012-07-11 19:42:45
Message-ID: 1342035182-sup-2265@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Excerpts from Alvaro Herrera's message of mar jul 10 16:23:57 -0400 2012:
> Excerpts from Kyotaro HORIGUCHI's message of mar jul 03 04:59:38 -0400 2012:
> > Hello, Here is regression test runs on pg's also built with
> > cygwin-gcc and VC++.
> >
> > The patches attached following,
> >
> > - plperl_sql_ascii-4.patch : fix for pl/perl utf8 vs sql_ascii
> > - plperl_sql_ascii_regress-1.patch : regression test for this patch.
> > I added some tests on encoding to this.
> >
> > I will mark this patch as 'ready for committer' after this.
>
> I have pushed these changes to HEAD, 9.2 and 9.1. Instead of the games
> with plperl_lc_*.out being copied around, I just used the ASCII version
> as plperl_lc_1.out and the UTF8 one as plperl_lc.out.

... and this story hasn't ended yet, because one of the new tests is
failing. See here:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04

The interesting part of the diff is:

***************
*** 34,41 ****
return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
$$ LANGUAGE plperl;
SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
! encode
! --------------------------
! NotUTF8:ab\345\267\235cd
! (1 row)
!
--- 34,38 ----
return ($str ne $match ? $code."DIFFER" : $code."ab\x{5ddd}cd");
$$ LANGUAGE plperl;
SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
! ERROR: character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
! CONTEXT: PL/Perl function "perl_utf_inout"

I am not sure what can we do here other than remove this function and
query from the test.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-07-11 19:47:47 Re: [PATCH] lock_timeout and common SIGALRM framework
Previous Message Robert Haas 2012-07-11 18:49:39 Re: Synchronous Standalone Master Redoux