parallel workers and client encoding

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: parallel workers and client encoding
Date: 2016-06-07 01:45:04
Message-ID: 1739a900-30ab-f48e-aec4-2b35475ecf02@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There appears to be a problem with how client encoding is handled in the
communication from parallel workers. In a parallel worker, the client
encoding setting is inherited from its creating process as part of the
GUC setup. So any plain-text stuff the parallel worker sends to its
leader is actually converted to the client encoding. Since most data is
sent in binary format, the plain-text provision applies mainly to notice
and error messages. At the other end, error messages are parsed using
pq_parse_errornotice(), which internally uses routines that were meant
for communication from the client, and therefore will convert everything
back from the client encoding to the server encoding. So this whole
thing actually happens to work as long as round tripping is possible
between the involved encodings.

In cases where it isn't, it's still hard to notice the difference
because depending on whether you get a parallel plan or not, the
following happens:

not parallel: conversion error happens between server and client, client
sees an error message about that

parallel: conversion error happens between worker and leader, worker
generates an error message about that, sends it to leader, leader
forwards it to client

The client sees the same error message in both cases.

To construct a case where this makes a difference, the leader has to be
set up to catch certain errors. Here is an example:

"""
create table test1 (a int, b text);
truncate test1;
insert into test1 values (1, 'a');

create or replace function test1() returns text language plpgsql
as $$
declare
res text;
begin
perform from test1 where a = test2();
return res;
exception when division_by_zero then
return 'boom';
end;
$$;

create or replace function test2() returns int language plpgsql
parallel safe
as $$
begin
raise division_by_zero using message = 'Motörhead';
return 1;
end
$$;

set force_parallel_mode to on;

select test1();
"""

With client_encoding = server_encoding, this will return a single row
'boom'. But with, say, database encoding UTF8 and
PGCLIENTENCODING=KOI8R, it will error:

ERROR: 22P05: character with byte sequence 0xef 0xbe 0x83 in encoding
"UTF8" has no equivalent in encoding "KOI8R"
CONTEXT: parallel worker

(Note that changing force_parallel_mode does not force replanning in
plpgsql, so if you run test1() first before setting force_parallel_mode,
then you won't get the error.)

Attached is a patch to illustrates how this could be fixed. There might
be similar issues elsewhere. The notification propagation in particular
could be affected.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
parallel-client-encoding.patch text/plain 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-06-07 03:14:31 Re: Parallel pg_dump's error reporting doesn't work worth squat
Previous Message Korbin Hoffman 2016-06-06 23:57:10 Re: hstore: add hstore_length function