Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: "tutiluren(at)tutanota(dot)com" <tutiluren(at)tutanota(dot)com>, Pgsql Bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.
Date: 2020-07-24 10:30:35
Message-ID: CAC+AXB2KFQKc9M8m=O8=vTWu8D+FmHGGXQ0vpMCMGYED0yLwXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 24, 2020 at 8:14 AM David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Thursday, July 23, 2020, <tutiluren(at)tutanota(dot)com> wrote:
>
>> After taking a longer break from my problem, I have now made a fresh,
>> clean, scientifically conducted experiment in order to truly get to the
>> bottom of this annoying problem once and for all.
>>
>> First, I set cmd.exe to use Unicode, just to be sure:
>>
>> C:\pg_dump_test>chcp 65001
>> Active code page: 65001
>>
>> pg_dump: creating CONSTRAINT "Test schäma.Test täble Test täble_pkey"
>>
>> = WORKS. The dump was successful. (In spite of weird output chars.)
>>
>> C:\pg_dump_test>pg_dump --format plain --verbose --file "testdump.txt"
>> --exclude-table-data="Test schäma.Test täble" --host="localhost"
>> --port="5432" --username="postgres" --dbname="test"
>> pg_dump: last built-in OID is 16383
>> pg_dump: [archiver (db)] query failed: ERROR: invalid byte sequence for
>> encoding "UTF8": 0xe4 0x62 0x6c
>> pg_dump: [archiver (db)] query was: SELECT c.oid
>> FROM pg_catalog.pg_class c
>> LEFT JOIN pg_catalog.pg_namespace n
>> ON n.oid OPERATOR(pg_catalog.=) c.relnamespace
>> WHERE c.relkind OPERATOR(pg_catalog.=) ANY
>> (array['r', 'S', 'v', 'm', 'f', 'p'])
>> AND c.relname OPERATOR(pg_catalog.~) '^(test täble)$'
>> AND n.nspname OPERATOR(pg_catalog.~) '^(test schäma)$'
>>
>> = FAILED. The dump was aborted with these nonsensical errors.
>>
>> Finally, I tried the same command again like this:
>>
>> I looked everywhere for some kind of "client-encoding" option in the
>> pg_dump manual, but there is no such thing. The only thing I can think of
>> is that the client's encoding (that is, pg_dump) is for some reason not set
>> to "UTF8" even though that's the encoding of the "test" database which I'm
>> connecting to.
>>
>
> The main thing here is this isn’t properly considered a bug in pg_dump but
> rather with most likely any client command run in the cmd.exe shell.
>

The problem is coming from CMD and its UTF8 support, but I guess I am not
making a great job in convincing you. I will try to explain it through an
example:

1. Start a CMD using "CMD /U", and set the code page to UTF8:

C:\postgres>chcp 65001
Active code page: 65001

2. Create a database named 'test', an schema named "Ö" and table named "Ä":

C:\postgres>psql -c "create database test" postgres
CREATE DATABASE

C:\postgres>psql -c "\l test" postgres
List of databases
Name | Owner | Encoding | Collate |
Ctype | Access privileges
------+--------------+----------+----------------------------+----------------------------+-------------------
test | postgres | UTF8 | English_United States.1252 |
English_United States.1252 |
(1 row)

C:\postgres>psql -c "create schema "Ö"" test
CREATE SCHEMA
C:\postgres>psql -c " create table "Ö"."Ä" (c1 char(1))" test
CREATE TABLE

3. Test the table and check your client encoding:

C:\postgres>psql -c "select * from "Ö"."Ä"" test
c1
----
(0 rows)

C:\postgres>psql -c "show client_encoding" test
client_encoding
-----------------
WIN1252
(1 row)

4. Change the client encoding to UTF8 and repeat the test:

C:\postgres>set PGCLIENTENCODING=UTF8

C:\postgres>psql -c "show client_encoding" test
client_encoding
-----------------
UTF8
(1 row)

C:\postgres>psql -c "select * from "Ö"."Ä"" test
2020-07-24 11:25:01.710 CEST [1452] ERROR: invalid byte sequence for
encoding "UTF8": 0xd6 0x2e
ERROR: invalid byte sequence for encoding "UTF8": 0xd6 0x2e

The reason why this happens is because the limited CMD's UTF8 support,
'chcp' only changes the code page of the displayed characters, shell
commands, piping, redirection and most commands are still ANSI only.

Does explicitly setting 'PGCLIENTENCODING=WIN1252' on your shell solves the
issue?

Regards,

Juan José Santamaría Flecha

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Etsuro Fujita 2020-07-24 14:45:52 Re: BUG #16500: SQL Abend. select multi_key_columns_range_partition_table
Previous Message David G. Johnston 2020-07-24 06:14:08 Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.