Re: Large object corruption during 'piped' pg_restore

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bosco Rama <postgres(at)boscorama(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Large object corruption during 'piped' pg_restore
Date: 2011-01-20 23:14:57
Message-ID: 23205.1295565297@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Bosco Rama <postgres(at)boscorama(dot)com> writes:
>>> If 'standard_conforming_strings = on' is set in our DB (which is required for
>>> our app) then the piped restore method (e.g. pg_restore -O backup.dat | psql)
>>> results in the large objects being corrupted.

> All servers and client tools involved are PG 8.4.6 on Ubuntu Server 10.04.1 LTS
> with all current updates applied.

I've been able to replicate this in 8.4; it doesn't happen in 9.0
(but probably does in all 8.x versions).

The problem is that pg_dump (or in this case really pg_restore) is
relying on libpq's PQescapeBytea() to format the bytea literal that
will be given as argument to lowrite() during the restore. When
pg_dump is producing SQL directly, or when pg_restore is connected
to a database, PQescapeBytea() mooches the standard_conforming_strings
value from the active libpq connection and gets the right answer.
In the single case where pg_restore is producing SQL without ever
opening a database connection, PQescapeBytea doesn't know what to do
and defaults to the old non-standard-strings behavior. Unfortunately
pg_restore set standard_conforming_strings=on earlier in the script
(if it was set in the original source database) so you get the wrong
thing.

The bottom line is that pg_dump can't depend on libpq's PQescapeBytea,
but needs its own copy. We have in fact done that as of 9.0, which is
what I was vaguely remembering:

Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Branch: master Release: REL9_0_BR [b1732111f] 2009-08-04 21:56:09 +0000

Fix pg_dump to do the right thing when escaping the contents of large objects.

The previous implementation got it right in most cases but failed in one:
if you pg_dump into an archive with standard_conforming_strings enabled, then
pg_restore to a script file (not directly to a database), the script will set
standard_conforming_strings = on but then emit large object data as
nonstandardly-escaped strings.

At the moment the code is made to emit hex-format bytea strings when dumping
to a script file. We might want to change to old-style escaping for backwards
compatibility, but that would be slower and bulkier. If we do, it's just a
matter of reimplementing appendByteaLiteral().

This has been broken for a long time, but given the lack of field complaints
I'm not going to worry about back-patching.

I'm not sure whether this new complaint is enough reason to reconsider
back-patching. We cannot just backport the 9.0 patch, since it assumes
it can do bytea hex output --- we'd need to emit old style escaped
output instead. So it's a bit of work, and more to the point would
involve pushing poorly-tested code into stable branches. I doubt it
would go wrong, but in the worst-case scenario we might create failures
for blob-restore cases that work now.

So I'm not sure whether to fix it, or leave it as a known failure case
in old branches. Comments?

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2011-01-21 00:14:57 Re: plpyhton
Previous Message Bosco Rama 2011-01-20 22:07:01 Re: Large object corruption during 'piped' pg_restore

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2011-01-20 23:19:20 Re: ALTER TABLE ... REPLACE WITH
Previous Message Simon Riggs 2011-01-20 23:11:30 Re: SSI and Hot Standby