Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.

From: Juan José Santamaría Flecha <juanjo(dot)santamaria(at)gmail(dot)com>
To: tutiluren(at)tutanota(dot)com
Cc: Pgsql Bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_dump seems to be broken in regards to the "--exclude-table-data" option on Windows.
Date: 2020-07-26 14:18:16
Message-ID: CAC+AXB1XEkkWRn-3it=nY3n4kyOGMiqVwPeTiRVTT6CebJapvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Jul 25, 2020 at 4:21 AM <tutiluren(at)tutanota(dot)com> wrote:

> Alright. Sigh. I noticed a huge difference in the file sizes between my
> normal backups and my latest one. I started getting suspicious, so I made a
> new test WITHOUT using "--exclude-table-data" at all:
>
> I dumped the same database:
>
> 1. With setting PGCLIENTENCODING=WIN1252, the weird "workaround".
> 2. Without setting that. (That is, UTF8.)
>
> The first one is *MUCH* smaller. Opening it up in a visual diff viewer, I
> can see that HUGE amounts of my data has simply not been copied in the
> first case. Which is a nightmare scenario and thank God that I noticed this
> instead of just assuming that it was working now... my backups would've
> been worthless.
>
> In other words: setting PGCLIENTENCODING=WIN1252 when the database is UTF8
> makes pg_dump ignore massive amounts of the data in the database. For this
> reason, I cannot possibly use this as a "workaround" for my
> "--exclude-table-data" problem.
>

You are comparing a file that uses a single-byte encoding (WIN1252) with
another file that uses multibyte encoding (UTF8), so the size difference is
not unexplainable.

Also, diff-ing two files with mismatched encodings is not going to work as
expected. What you can do is, change the display code page of the CMD to
match the PGCLIENTENCODING (chcp 1252 & chcp 65001), and use the command
"type" to print on screen the content of the dump files generated with both
encodings. If you find a mismatch, please share.

> Yes, I have very carefully tried with this with "cmd.exe /U" as well as
> setting the Unicode codepage; it makes *no difference*. Nothing seems to
> make a difference; pg_dump doesn't seem to *want* to work. It's "all or
> nothing". I can't exclude any part of the database. The
> "PGCLIENTENCODING=WIN1252" workaround is sadly insanely dangerous and
> unusable.
>

Your OS code page is WIN1252, that is something with a heavy impact in the
system. In fact, your client is natively WIN1252 and explicitly setting the
PGCLIENTENCODING is not a weird hack, but a regular configuration
parameter. With all the configuring on the CMD code page, we can only
change the encoding of the displayed text.

Is it entirely unthinkable that this is a pg_dump bug?
>

What you are describing does not look like a bug to me, but a client
encoding problem.

If PGCLIENTENCODING=WIN1252 was failing for pg_dump, it would not do it
silently. You would see something like:

pg_dump: error: Dumping the contents of table "Ä" failed: PQgetResult()
failed.
pg_dump: error: Error message from server: ERROR: character with byte
sequence 0xe5 0x82 0x89 in encoding "UTF8" has no equivalent in encoding
"WIN1252"
pg_dump: error: The command was: COPY "Ö"."Ä" (c1) TO stdout;

If you see this error, then PGCLIENTENCODING=1252 will not be a viable
workaround for you, and will have to resort to any of the possible
solutions that have already been suggested: activate the beta UTF8 support
of the Windows Regional settings or access your database from a system with
true UTF8 terminal support.

Regards,

Juan José Santamaría Flecha

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-07-26 16:59:36 BUG #16555: Postgresql is not LTO ready
Previous Message Andy Fan 2020-07-26 09:41:36 Re: Reported type mismatch improperly