Re: EBCDIC sorting as a use case for ICU rules

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>, pgsql-hackers(at)postgresql(dot)org
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: EBCDIC sorting as a use case for ICU rules
Date: 2023-07-06 09:32:32
Message-ID: 1f20d0d7-6b15-d10f-94f5-77b2e82112b1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.06.23 15:28, Daniel Verite wrote:
> A collation like the following this seems to work (the rule simply enumerates
> US-ASCII letters in the EBCDIC alphabet order, with adequate quoting)
>
> CREATE COLLATION ebcdic (provider='icu', locale='und',
> rules=$$&'
> '<'.'<'<'<'('<'+'<\|<'&'<'!'<'$'<'*'<')'<';'<'-'<'/'<','<'%'<'_'<'>'<'?'<'`'<':'<'#'<'@'<\'<'='<'"'<a<b<c<d<e<f<g<h<i<j<k<l<m<n<o<p<q<r<'~'<s<t<u<v<w<x<y<z<'['<'^'<']'<'{'<A<B<C<D<E<F<G<H<I<'}'<J<K<L<M<N<O<P<Q<R<'\'<S<T<U<V<W<X<Y<Z<0<1<2<3<4<5<6<7<8<9$$);
>
> This can be useful for people who migrate from mainframes to Postgres
> and need their migration tests to produce the same sorted results as the
> original system.
> Since rules can be defined at the database level with the icu_rules option,
> they don't even need to tweak their queries to add COLLATE clauses,
> which surely is appreciable in that kind of project.
>
> US-ASCII when sorted in EBCDIC order comes out like this:
>
> .<(+|&!$*);-/,%_>?`:#@'="abcdefghijklmnopqr~stuvwxyz[^]{ABCDEFGHI}JKLMNOPQR\ST
> UVWXYZ0123456789
>
> Maybe this example could be added to the documentation except for
> the problem that the rule is very long and dollar-quoting cannot be split
> into several lines. Literals enclosed by single quotes can be split that
> way, but would require escaping the single quotes in the rule, which
> would lead to scary-looking over-quoted contents.

You can use whitespace in the rules. For example,

CREATE COLLATION ebcdic (provider='icu', locale='und',
rules=$$
& ' ' < '.' < '<' < '(' < '+' < \|
< '&' < '!' < '$' < '*' < ')' < ';'
< '-' < '/' < ',' < '%' < '_' < '>' < '?'
< '`' < ':' < '#' < '@' < \' < '=' < '"'
< a < b < c < d < e < f < g < h < i
< j < k < l < m < n < o < p < q < r
< '~' < s < t < u < v < w < x < y < z
< '[' < '^' < ']'
< '{' < A < B < C < D < E < F < G < H < I
< '}' < J < K < L < M < N < O < P < Q < R
< '\' < S < T < U < V < W < X < Y < Z
< 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9
$$);

(This particular layout is meant to match the rows in
https://en.wikipedia.org/wiki/EBCDIC#Code_page_layout.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-07-06 09:35:41 Re: EBCDIC sorting as a use case for ICU rules
Previous Message Amit Kapila 2023-07-06 09:30:27 Re: logicalrep_message_type throws an error