plperl vs LC_COLLATE (was Re: Possible savepoint bug)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paesold <mpaesold(at)gmx(dot)at>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: plperl vs LC_COLLATE (was Re: Possible savepoint bug)
Date: 2005-12-28 15:35:19
Message-ID: 17227.1135784119@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paesold <mpaesold(at)gmx(dot)at> writes:
> This is a theory. The whole database was loaded using pg_restore, I still
> have the original dump so I will have a look at the dump now. The database
> actually contains some plperl functions.

OK, I think I have reproduced the problem. initdb in C locale, then
start postmaster with LANG=en_US.UTF-8 in its environment. Then:

z1=# create language plperl;
CREATE LANGUAGE
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
t -- correct result for C locale
(1 row)

z1=# \c z1
You are now connected to database "z1".
z1=# SET check_function_bodies = false;
SET
z1=# create or replace function perlf() returns text as $$
z1$# return 'foo';
z1$# $$ language plperl;
CREATE FUNCTION
z1=# select 'enum.server_task_log.status.RUNNING'::varchar < 'enum.server_task_log.status.keys'::varchar;
?column?
----------
f -- WRONG result for C locale
(1 row)

So the mere act of defining a plperl function, even with
check_function_bodies = false, is sufficient to send control through
that bit of libperl code that does setlocale(LC_ALL, ""). Ugh.
This is much worse than I thought.

The reason I had not seen it before is that lc_collate_is_c caches its
result, which means that if you do any text/varchar comparisons before
first invoking libperl, you won't see any misbehavior (at least not when
you started in C locale). The reconnect in the middle of the above test
sequence is essential to reproduce the failure.

We were talking last week about forcing the LANG/LC_ environment
variables to match our desired settings within the postmaster.
I think this example raises the priority of doing that by several
notches :-(

In the meantime, Michael, I'd suggest modifying your postmaster start
script to force LANG=C, and then reindexing all indexes you have on
text/varchar/char columns. That should get you out of the immediate
problem and prevent it from recurring before we have a fix. (The
system catalogs should be OK because they use "name" which is not
locale-sensitive.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-12-28 15:39:37 Re: WAL logs multiplexing?
Previous Message Ian Harding 2005-12-28 15:07:54 Re: WAL logs multiplexing?