Re: Built-in CTYPE provider

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeremy Schneider <schneider(at)ardentperf(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Built-in CTYPE provider
Date: 2024-02-16 00:13:19
Message-ID: ef9266bfb44816ae3d3f7edee96cdcb417358173.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2024-02-07 at 10:53 +0100, Peter Eisentraut wrote:
> Review of the v16 patch set:
>
> (Btw., I suppose you started this patch series with 0002 because some
> 0001 was committed earlier.  But I have found this rather confusing. 
> I
> think it's ok to renumber from 0001 for each new version.)

Fixed.

> Various comments are updated to include the term "character class". 
> I
> don't recognize that as an official Unicode term.  There are
> categories
> and properties.  Let's check this.

Changed to "properties" or "compatibility properties", except for a
couple places in the test. The test compares against ICU, which does
use the term "character classes" when discussing regexes:

https://unicode-org.github.io/icu-docs/apidoc/dev/icu4c/uchar_8h.html#details

> Some files need heavy pgindent and perltidy.

Done.

> This patch series adds some new post-update-unicode tests.  Should we
> have a separate target for each or just one common "unicode test"
> target?  Not sure.

I didn't make a change here. I suspect anyone updating unicode would
want to run them all, but I don't have a strong opinion.

> - .../generate-unicode_category_table.pl
>
> The trailing commas handling ($firsttime etc.) is not necessary with
> C99.  The code can be simplified.

Thank you, fixed.

> For this kind of code:
>
> +print $OT <<"HEADER";

Done. I used the <<"EOS" style which is more friendly to emacs, but I'm
not sure if that's right for the project style.

> Is it potentially confusing that only some pg_u_prop_* have a posix
> variant?  Would it be better for a consistent interface to have a
> "posix" argument for each and just ignore it if not used?  Not sure.

I don't have a strong opinion here, so I didn't make a change. I can if
you think it's cleaner.

> Let's use size_t instead of Size for new code.

Done.

> * v16-0003-Add-unicode-case-mapping-tables-and-functions.patch
>
> Several of the above points apply here analogously.

Fixed, I think.

> * v16-0004-Catalog-changes-preparing-for-builtin-collation-.patch
>
> This is mostly a straightforward renaming patch, but there are some
> changes in initdb and pg_dump that pre-assume the changes in the next
> patch, like which locale columns apply for which providers.  I think
> it
> would be better for the historical record to make this a straight
> renaming patch and move those semantic changes to the next patch (or
> a
> separate intermediate patch, if you prefer).

Agreed, put those non-renaming changes in the next patch.

> - src/bin/psql/describe.c
> - src/test/regress/expected/psql.out
>
> This would be a good opportunity to improve the output columns for
> collations.  The updated view is now:
>
> + Schema | Name | Provider | Collate | Ctype | Locale | ICU Rules |
> Deterministic?
> +--------+------+----------+---------+-------+--------+-----------+--
> --------------
>
> This is historically grown but suboptimal.  Why is Locale after
> Collate
> and Ctype, and why does it show both?  I think we could have just the
> Locale column, and if the libc provider is used with different
> collate/ctype (very rare!), we somehow write that into the single
> locale
> column.
>
> (A change like this would be a separate patch.)

I didn't do this, yet.

> * v16-0005-Introduce-collation-provider-builtin-for-C-and-C.patch
>
> About this initdb --builtin-locale option and analogous options
> elsewhere:  Maybe we should flip this around and provide a --libc-
> locale
> option, and have all the other providers just use the --locale
> option.
> This would be more consistent with the fact that it's libc that is
> special in this context.

I agree that libc is the odd one out. I'm not quite sure how we should
express that, though, because there are also the other environment
variables to worry about (e.g. LC_MESSAGES). Probably best as a
separate patch.

> Some changes in this patch appear to be just straight renamings, like
> in
> src/backend/utils/init/postinit.c and
> src/bin/pg_upgrade/t/002_pg_upgrade.pl.  Maybe those should be put
> into
> the previous patch instead.

Moved renamings to the previous patch.

Regards,
Jeff Davis

Attachment Content-Type Size
v17-0001-Add-Unicode-property-tables.patch text/x-patch 92.8 KB
v17-0002-Add-unicode-case-mapping-tables-and-functions.patch text/x-patch 143.6 KB
v17-0003-Catalog-changes-preparing-for-builtin-collation-.patch text/x-patch 48.5 KB
v17-0004-Introduce-collation-provider-builtin-for-C-and-C.patch text/x-patch 80.3 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-02-16 00:32:45 RE: Synchronizing slots from primary to standby
Previous Message jian he 2024-02-16 00:00:00 Re: POC, WIP: OR-clause support for indexes