Re: Unicode escapes with any backend encoding

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Unicode escapes with any backend encoding
Date: 2020-03-04 05:32:59
Message-ID: CACPNZCs+aSspM1R7NwOoryjGJGNiDKMU1_gL5SAcbH58gwqwhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 25, 2020 at 1:49 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I wrote:
> > [ unicode-escapes-with-other-server-encodings-2.patch ]
>
> I see this patch got sideswiped by the recent refactoring of JSON
> lexing. Here's an attempt at fixing it up. Since the frontend
> code isn't going to have access to encoding conversion facilities,
> this creates a difference between frontend and backend handling
> of JSON Unicode escapes, which is mildly annoying but probably
> isn't going to bother anyone in the real world. Outside of
> jsonapi.c, there are no changes from v2.

With v3, I successfully converted escapes using a database with EUC-KR
encoding, from strings, json, and jsonpath expressions.

Then I ran a raw parsing microbenchmark with ASCII unicode escapes in
UTF-8 to verify no significant regression. I also tried the same with
EUC-KR, even though that's not really apples-to-apples since it
doesn't work on HEAD. It seems to give the same numbers. (median of 3,
done 3 times with postmaster restart in between)

master, UTF-8 ascii
1.390s
1.405s
1.406s

v3, UTF-8 ascii
1.396s
1.388s
1.390s

v3, EUC-KR non-ascii
1.382s
1.401s
1.394s

Not this patch's job perhaps, but now that check_unicode_value() only
depends on the input, maybe it can be put into pgwchar.h with other
static inline helper functions? That test is duplicated in
addunicode() and pg_unicode_to_server(). Maybe:

static inline bool
codepoint_is_valid(pgwchar c)
{
return (c > 0 && c <= 0x10FFFF);
}

Maybe Chapman has a use case in mind he can test with? Barring that,
the patch seems ready for commit.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2020-03-04 05:45:52 Re: logical replication empty transactions
Previous Message Masahiko Sawada 2020-03-04 05:31:47 Re: Some problems of recovery conflict wait events