From: | Alex Hunsaker <badalex(at)gmail(dot)com> |
---|---|
To: | "David E(dot) Wheeler" <david(at)kineticode(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Careful PL/Perl Release Not Required |
Date: | 2011-02-11 20:57:26 |
Message-ID: | AANLkTi=+EpO9XBwhP++WuBgTvQ4jE4ywSM=p5xvE1QH1@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 11, 2011 at 11:07, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> I don't understand where the bug is. If a string is encoded in utf-8 Perl will not treat it as such unless the utf-8 flag is set.
Ok so I think we agreed this is right:
$ perl -E 'use URI::Escape; my $str = uri_unescape("%C3%A9"); say
sprintf("chr: %s hex: %s, len: %s", $str, unpack("H*", $str), length
$str)'
chr: é hex: c3a9, len: 2
Key part here is len = 2, or 2 characters.
Lets try that in a postgres 9.0 utf8 database:
=> create or replace function uri_decode(txt text, in_decode int,
out_decode int) returns text as $$
use URI::Escape;
my $str = shift;
utf8::decode($str) if(shift);
$str = uri_unescape($str);
utf8::decode($str) if(shift);
return $str;
$$ language plperlu;
-- For ease we are just going to look at the length as most terminals
will have utf8 and latin1 mapped.
=> SELECT length(uri_decode('%c3%a9', 0, 0));
length
--------
2
(1 row)
Looks right.
What happens if we decode after uri_unescape, we should get 1 character no?
-- decode after uri_unescape
=> SELECT length(uri_decode('%c3%a9', 0, 1));
length
--------
1
Ok thats right.
What happens if we decode before? Nothing should right? After all
'%c3%a9' is all asci. We should still get 2 characters.
=> SELECT length(uri_decode('%c3%a9', 1, 0));
length
--------
1
Whoa! 1? Does vanilla perl do that?:
perl <<'perl'
use URI::Escape;
my $str = '%c3%a9';
utf8::decode($str);
$str = uri_unescape($str);
print sprintf("chr: %s hex: %s, len: %s\n", $str, unpack("H*", $str), length
$str);
perl
chr: é hex: c3a9, len: 2
Nope, so postgres gets it wrong here. Thats the problem.
In 9.1 it does "the right thing":
=> SELECT length(uri_decode(0, 0));
length
--------
2
Yay! 2!
=> SELECT length(uri_decode(1, 0));
CONTEXT: PL/Perl function "uri_decode"
length
--------
2
Yay! also 2!
=> SELECT length(uri_decode(0, 1));
length
--------
1
Yay! 1
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Farina | 2011-02-11 21:02:34 | Re: Debian readline/libedit breakage |
Previous Message | Dimitri Fontaine | 2011-02-11 20:55:18 | Re: ALTER EXTENSION UPGRADE, v3 |