Re: Radix tree for character conversion

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, robertmhaas(at)gmail(dot)com, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, ishii(at)sraoss(dot)co(dot)jp, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Radix tree for character conversion
Date: 2016-11-07 11:32:55
Message-ID: EE8775B6-BE30-459D-9DDB-F3D0B3FF573D@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 04 Nov 2016, at 08:34, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Thank you for looling this.

And thank you for taking the time to read my patches!

> At Mon, 31 Oct 2016 17:11:17 +0100, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote in <3FC648B5-2B7F-4585-9615-207A44B730A9(at)yesql(dot)se>
>>> On 27 Oct 2016, at 09:23, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> Perl scripts are to be messy, I believe. Anyway the duplicate
>>> check as been built into the sub print_radix_trees. Maybe the
>>> same check is needed by some plain map files but it would be just
>>> duplication for the maps having radix tree.
>>
>> I took a small stab at doing some cleaning of the Perl scripts, mainly around
>> using the more modern (well, modern as in +15 years old) form for open(..),
>> avoiding global filehandles for passing scalar references and enforcing use
>> strict. Some smaller typos and fixes were also included. It seems my Perl has
>> become a bit rusty so I hope the changes make sense. The produced files are
>> identical with these patches applied, they are merely doing cleaning as opposed
>> to bugfixing.
>>
>> The attached patches are against the 0001-0006 patches from Heikki and you in
>> this series of emails, the separation is intended to make them easier to read.
>
> I'm not sure how the discussion about this goes, these patches
> makes me think about coding style of Perl.

Some of this can absolutely be considered style and more or less down to
personal preference. I haven’t seen any coding conventions for Perl so I
assume it’s down to consensus among the committers. My rationale for these
patches in the first place was that I perceived this thread to partly want to
clean up the code and make it more modern Perl.

> The distinction between executable script and library is by
> intention with an obscure basis. Existing scripts don't get less
> modification, but library uses more restricted scopes to get rid
> of the troubles caused by using global scopes. But I don't have a
> clear preference on that. The TAP test scripts takes OO notations
> but I'm not sure convutils.pl also be better to take the same
> notation. It would be rarely edited hereafter and won't gets
> grown any more.

I think the current convutils module is fine and converting it to OO would be
overkill.

> As far as I see the obvious bug fixes in the patchset are the
> following,

Agreed, with some comments:

> - 0007: load_maptable fogets to close input file.

An interesting note on this is that it’s not even a bug =) Since $in is a
scalar reference, there is no need to explicitly close() the filehandle since
the reference counter will close it on leaving scope, but there’s no harm in
doing it ourselves and it also makes for less confusion for anyone not familiar
with Perl internals.

> - 0010: commment for load_maptables is wrong.

There is also a fix for a typo in make_mapchecker.pl

> - 0011: hash reference is incorrectly dereferenced
>
> All other fixes other than the above three seem to be styling or
> syntax-generation issues and I don't know whether any
> recommendation exists…

I think there are some more fixes that arent styling/syntax remaining. I’ll go
through the patches one by one:

0007 - While this might be considered styling/syntax, my $0.02 is that it’s not
and instead a worthwhile change. I’ll illustrate with an example from the
patch in question:

Using a bareword global variable in open() for the filehandle was replaced with
the three-part form in 5.6 and is now even actively discouraged from in the
Perl documentation (and has been so since the 5.20 docs). The problem is that
they are global and can thus easily clash, so easily that the 0007 patch
actually fixes one such occurrence:

print_radix_map() opens the file in the global filehandle OUT and passes it to
print_radix_table() with the typeglob *OUT; print_radit_table() in turn passes
the filehandle to print_segmented_table() which writes to the file using the
parameter $hd, except in one case where it uses the global OUT variable without
knowing it will be the right file. This is where the hunk below in 0007 comes
in:

- print OUT "$line\n";
+ print { $$hd } "$line\n";

In this case OUT references the right file and it produces the right result,
but it illustrates how easy it is to get wrong (which can cause very subtle
bugs). So, when poking at this code we might as well, IMHO, use what is today
in Perl considered the right way to deal with filehandle references.

Using implicit filemodes can also introduce bugs when opening filenames passed
in from the outside as we do in UCS_to_most.pl. Considering the use case of
these scripts it’s obviously quite low on the list of risks but still.

0008 - I don’t think there are any recommendations whether or not to use use
strict; in the codebase, there certainly are lots of scripts not doing it.
Personally I think it’s good hygiene to always use strict but here it might
just be janitorial nitpicking (which I too am guilty of liking.. =)).

0009 - local $var; is to provide a temporary value of $var, where $var exists,
for the current scope (and was mostly used back in Perl 4). Since we are
passing by value to ucs2utf(), and creating $utf inside it, using my to create
the variable is the right thing even though the end result is the same.

0010 and 0011 are already dealt with above.

So to summarize, I think there are a few more (while not all) hunks that are of
interest which aren’t just syntax/style which can serve to make the code easer
to read/work with down line should we need to.

cheers ./daniel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shay Rojansky 2016-11-07 11:40:50 Re: macaddr 64 bit (EUI-64) datatype support
Previous Message Masahiko Sawada 2016-11-07 11:17:29 Re: Specifying the log file name of pgbench -l option