From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Unicode escapes with any backend encoding |
Date: | 2020-03-04 05:32:59 |
Message-ID: | CACPNZCs+aSspM1R7NwOoryjGJGNiDKMU1_gL5SAcbH58gwqwhg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 25, 2020 at 1:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > [ unicode-escapes-with-other-server-encodings-2.patch ]
>
> I see this patch got sideswiped by the recent refactoring of JSON
> lexing. Here's an attempt at fixing it up. Since the frontend
> code isn't going to have access to encoding conversion facilities,
> this creates a difference between frontend and backend handling
> of JSON Unicode escapes, which is mildly annoying but probably
> isn't going to bother anyone in the real world. Outside of
> jsonapi.c, there are no changes from v2.
With v3, I successfully converted escapes using a database with EUC-KR
encoding, from strings, json, and jsonpath expressions.
Then I ran a raw parsing microbenchmark with ASCII unicode escapes in
UTF-8 to verify no significant regression. I also tried the same with
EUC-KR, even though that's not really apples-to-apples since it
doesn't work on HEAD. It seems to give the same numbers. (median of 3,
done 3 times with postmaster restart in between)
master, UTF-8 ascii
1.390s
1.405s
1.406s
v3, UTF-8 ascii
1.396s
1.388s
1.390s
v3, EUC-KR non-ascii
1.382s
1.401s
1.394s
Not this patch's job perhaps, but now that check_unicode_value() only
depends on the input, maybe it can be put into pgwchar.h with other
static inline helper functions? That test is duplicated in
addunicode() and pg_unicode_to_server(). Maybe:
static inline bool
codepoint_is_valid(pgwchar c)
{
return (c > 0 && c <= 0x10FFFF);
}
Maybe Chapman has a use case in mind he can test with? Barring that,
the patch seems ready for commit.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2020-03-04 05:45:52 | Re: logical replication empty transactions |
Previous Message | Masahiko Sawada | 2020-03-04 05:31:47 | Re: Some problems of recovery conflict wait events |