Quick Links

Re: Radix tree for character conversion

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	hlinnaka(at)iki(dot)fi
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, michael(dot)paquier(at)gmail(dot)com, daniel(at)yesql(dot)se, peter(dot)eisentraut(at)2ndquadrant(dot)com, robertmhaas(at)gmail(dot)com, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, ishii(at)sraoss(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Radix tree for character conversion
Date:	2017-03-17 05:19:31
Message-ID:	20170317.141931.188239181.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thank you for committing this.

At Mon, 13 Mar 2017 21:07:39 +0200, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in <d5b70078-9f57-0f63-3462-1e564a57739f(at)iki(dot)fi>
> On 03/13/2017 08:53 PM, Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> >> It would be nice to run the map_checker tool one more time, though, to
> >> verify that the mappings match those from PostgreSQL 9.6.
> >
> > +1
> >
> >> Just to be sure, and after that the map checker can go to the dustbin.
> >
> > Hm, maybe we should keep it around for the next time somebody has a
> > bright
> > idea in this area?
>
> The map checker compares old-style maps with the new radix maps. The
> next time 'round, we'll need something that compares the radix maps
> with the next great thing. Not sure how easy it would be to adapt.
>
> Hmm. A somewhat different approach might be more suitable for testing
> across versions, though. We could modify the perl scripts slightly to
> print out SQL statements that exercise every mapping. For every
> supported conversion, the SQL script could:
>
> 1. create a database in the source encoding.
> 2. set client_encoding='<target encoding>'
> 3. SELECT a string that contains every character in the source
> encoding.

There are many encodings that can be client-encoding but cannot
be database-encoding. And some encodings such as UTF-8 has
several one-way conversion. If we do something like this, it
would be as the following.

1. Encoding test
1-1. create a database in UTF-8
1-2. set client_encoding='<source encoding>'
1-3. INSERT all characters defined in the source encoding.
1-4. set client_encoding='UTF-8'
1-5. SELECT a string that contains every character in UTF-8.
2. Decoding test

.... sucks!

I would like to use convert() function. It can be a large
PL/PgSQL function or a series of "SELECT convert(...)"s. The
latter is doable on-the-fly (by not generating/storing the whole
script).

> You could then run those SQL statements against old and new server
> version, and verify that you get the same results.

Including the result files in the repository will make this easy
but unacceptably bloats. Put mb/Unicode/README.sanity_check?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Re: Radix tree for character conversion at 2017-03-13 19:07:39 from Heikki Linnakangas

Responses

Re: Radix tree for character conversion at 2017-03-17 11:03:35 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2017-03-17 05:23:13	Re: Protect syscache from bloating with negative cache entries
Previous Message	Michael Paquier	2017-03-17 05:14:50	Re: Speedup twophase transactions