Re: BUG: pg_dump generates corrupted gzip file in Windows

From: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: pg_dump generates corrupted gzip file in Windows
Date: 2017-03-24 08:47:50
Message-ID: CAGz5QCJ_Vgn+mBE_ZW31kOa6oT2JME9K8634qhEzPJmU2jX=0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 24, 2017 at 12:35 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 24 March 2017 at 14:07, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>> On Fri, Mar 24, 2017 at 11:28 AM, Kuntal Ghosh
>> <kuntalghosh(dot)2007(at)gmail(dot)com> wrote:
>>> Hello,
>>> In Windows, if one needs to take a dump in plain text format (this is
>>> the default option, or can be specified using -Fp) with some level of
>>> compression (-Z[0-9]), an output file has to
>>> be specified. Otherwise, if the output is redirected to stdout, it'll
>>> create a corrupted dump (cmd is set to ASCII mode, so it'll put
>>> carriage returns in the file).
>> To reproduce the issue, please use the following command in windows cmd:
>>
>> pg_dump -Z 9 test > E:\test_xu.backup
>> pg_dump -Fp -Z 9 test > E:\test_xu.backup
>
> This is a known problem. It is not specific to PostgreSQL, it affects
> any software that attempts to use stdin/stdout on Windows via cmd,
> where it is not 8-bit clean.
>
> We don't just refuse to run with stdout as a destination because it's
> perfectly sensible if you're not using cmd.exe. pg_dump cannot, as far
> as I know, tell whether it's being invoked by cmd or something else.
ASAICU, if we use binary mode, output is stored bit by bit. In ASCII
mode, cmd pokes its nose and does CR / LF conversions on its own. So,
whenever we want compression on a plain-text dump file, we can set the
stdout mode to O_BINARY. Is it a wrong approach?

> If you have concrete ideas on how to improve this they'd be welcomed.
> Is there anywhere you expected to find info in the docs? Do you know
> of a way to detect in Windows if the output stream is not 8-bit clean
> from within the application program? ... other?
Actually, I'm not that familiar with windows environment. But, I
couldn't find any note to user in pg_dump documentation regarding the
issue. In cmd, if someone needs a plain-text dump in compressed
format, they should specify the output file, otherwise they may run
into the above problem. However, if a dump is corrupted due to the
above issue, a fix for that is provided in [1]. Should we include this
in the documentation?

[1] http://www.gzip.org/
Use fixgz.c to remove the extra CR (carriage return) bytes.

--
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2017-03-24 09:22:43 Re: Logical replication existing data copy
Previous Message Amit Langote 2017-03-24 08:27:03 Re: Partition-wise join for join between (declaratively) partitioned tables