Re: pl/perl and utf-8 in sql_ascii databases

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: alvherre(at)commandprompt(dot)com
Cc: badalex(at)gmail(dot)com, cb(at)df7cb(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pl/perl and utf-8 in sql_ascii databases
Date: 2012-06-21 11:22:43
Message-ID: 20120621.202243.116439861.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> > Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> > and SvPVUTF8() when turning a perl string into a cstring.
>
> Right.

I spent a bit longer time catching on pl/perl and now understand
what is the problem...

> So I played a bit with this patch, and touched it a bit mainly just to
> add some more comments; and while at it I noticed that some of the
> functions in Util.xs might leak some memory, so I made an attempt to
> plug them, as in the attached patch (which supersedes yours).

Ok, Is it ok to look into the newer patch including fix of leaks
at first?

-- Coding and styles.

This also seems to have polished the previous one on some codes,
styles and comments which generally look reasonable. And patch
style was corrected into unified.

-- Functions
I seems to work properly on the database the encodings of which
are SQL_ASCII and UTF8 (and EUC-JP) as below,

=================
=> create or replace function foo(text) returns text language plperlu as $$ $a = shift; return "BOO!" if ($a != "a\x80cあ"); return $a; $$;
SQL_ASCII=> select foo(E'a\200cあ') = E'a\200cあ';
?column?
----------
t
UTF8=> select foo(E'a\200cあ');
ERROR: invalid byte sequence for encoding "UTF8": 0x80
UTF8=> select foo(E'a\302\200cあ') = E'a\u0080cあ';
?column?
----------
t
=================

This looks quite valid according to the definition of the
encodings and perl's nature as far as I see.

-- The others

Variable naming in util_quote_*() seems a bit confusing,

> text *arg = sv2text(sv);
> text *ret = DatumGetTextP(..., PointerGetDatum(arg)));
> char *str;
> pfree(arg);
> str = text_to_cstring(ret);
> RETVAL = cstr2sv(str);
> pfree(str);

Renaming ret to quoted and str to ret as the patch attached might
make it easily readable.

> Now, with my version of the patch applied and using a SQL_ASCII database
> to test the problem in the original report, I notice that we now have a
> regression failure:
snip.
> I'm not really sure what to do here -- maybe have a second expected file
> for that test is a good enough answer? Or should I just take the test
> out? Opinions please.

The attached ugly patch does it. We seem should put NO_LOCALE=1
on the 'make check' command line for the encodings not compatible
with the environmental locale, although it looks work.

# UtfToLocal() seems to have a bug that always report illegal
# encoding was "UTF8" regardless of the real encoding. But
# plper_lc_*.(sql|out) increases if the bug is fixed.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

Attachment Content-Type Size
plperl_sql_ascii-3.patch text/x-patch 5.7 KB
plperl_sql_ascii_regress.patch text/x-patch 4.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-06-21 11:41:25 Catalog/Metadata consistency during changeset extraction from wal
Previous Message Etsuro Fujita 2012-06-21 11:20:56 Re: not null validation option in contrib/file_fdw