Re: BUG #16207: localization functions upper() and lower() does not work for text returned by convert_from()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: im-scooter(at)yandex(dot)ru
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16207: localization functions upper() and lower() does not work for text returned by convert_from()
Date: 2020-01-15 16:30:54
Message-ID: 8016.1579105854@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> Localization functions upper() and lower() does not work for text returned
> by convert_from() for cyrillic symbols.
> select upper(convert_from(convert_to('абв', 'utf8'), 'utf8')) =
> upper('абв'); -- return false

That happens because upper() depends on the collation that the parser
determines for its input expression. When you write just upper('абв')
you get the database's default collation attached to the literal, since
it's implicitly typed as text. However the convert_from() invocation
ends up with collation "C", in which non-ASCII characters aren't
going to be changed by upper(). You can fix it by assigning another
collation explicitly, e.g.

# select upper(convert_from(convert_to('абв', 'utf8'), 'utf8'));
upper
--------
абв
(1 row)

# select upper(convert_from(convert_to('абв', 'utf8'), 'utf8') collate "default");
upper
--------
АБВ
(1 row)

> On Postgresql 10.0 it works fine.

This case sort of accidentally works in pre-v12 versions, because
the convert_from call has no collatable inputs so it gets a default
collation assignment anyway. However, the function's collation-name
argument is of type name, and in v12 that type has collation "C"
rather than no collation, so the convert_from result ends up with
collation "C" not the database's default.

I had not realized till just now that that change would have any
side-effects on the behavior of convert_from (or more accurately,
on the behavior of expressions involving convert_from as an
input to a collation-sensitive function). I'm not sure whether
to call it a bug or not --- it's not really obvious that
convert_from() ought to be regarded as producing some particular
collation rather than another one. In any case, there seems very
little that we could do about it in the v12 release series, since
those catalog entries are fixed now.

Poking around, it seems that all of these functions would
be similarly affected:

# select oid::regprocedure from pg_proc where 'name'::regtype = any(proargtypes) and prorettype = 'text'::regtype;
oid
-----------------------------
text(name)
obj_description(oid,name)
shobj_description(oid,name)
convert_from(bytea,name)
to_ascii(text,name)
(5 rows)

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Eduardo Lúcio Amorim Costa 2020-01-15 17:31:57 SQL/PostgreSQL - Error observed in the QUERY not caught by the “EXCEPTION” block in the stored procedure
Previous Message Peter Eisentraut 2020-01-15 15:42:28 Re: libpq parameter parsing problem