Re: Radix tree for character conversion

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(dot)paquier(at)gmail(dot)com, daniel(at)yesql(dot)se, peter(dot)eisentraut(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, ishii(at)sraoss(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Radix tree for character conversion
Date: 2017-03-17 11:03:35
Message-ID: 01efd334-b839-0450-1b63-f2dea9326a7e@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/17/2017 07:19 AM, Kyotaro HORIGUCHI wrote:
> At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <d5b70078-9f57-0f63-3462-1e564a57739f(at)iki(dot)fi>
>> Hmm. A somewhat different approach might be more suitable for testing
>> across versions, though. We could modify the perl scripts slightly to
>> print out SQL statements that exercise every mapping. For every
>> supported conversion, the SQL script could:
>>
>> 1. create a database in the source encoding.
>> 2. set client_encoding='<target encoding>'
>> 3. SELECT a string that contains every character in the source
>> encoding.
>
> There are many encodings that can be client-encoding but cannot
> be database-encoding.

Good point.

> I would like to use convert() function. It can be a large
> PL/PgSQL function or a series of "SELECT convert(...)"s. The
> latter is doable on-the-fly (by not generating/storing the whole
> script).
>
> | -- Test for SJIS->UTF-8 conversion
> | ...
> | SELECT convert('\0000', 'SJIS', 'UTF-8'); -- results in error
> | ...
> | SELECT convert('\897e', 'SJIS', 'UTF-8');

Makes sense.

>> You could then run those SQL statements against old and new server
>> version, and verify that you get the same results.
>
> Including the result files in the repository will make this easy
> but unacceptably bloats. Put mb/Unicode/README.sanity_check?

Yeah, a README with instructions on how to do sounds good. No need to
include the results in the repository, you can run the script against an
older version when you need something to compare with.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-03-17 11:04:38 Re: Two phase commit in ECPG
Previous Message Emre Hasegeli 2017-03-17 10:50:26 Re: BRIN cost estimate