Re: Large object dumps vs older pg_restore

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: Large object dumps vs older pg_restore
Date: 2010-02-18 01:58:14
Message-ID: 4B7C9EB6.5070503@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached patch is modified to be applied to the latest tree,
and it increments the version number of the archive file.

kaigai(at)saba pg_dump]$ ./pg_dump -Ft postgres > ~/hoge.tar

[kaigai(at)saba pg_dump]$ /usr/bin/pg_restore --version
pg_restore (PostgreSQL) 8.4.2
[kaigai(at)saba pg_dump]$ /usr/bin/pg_restore -d test ~/hoge.tar
pg_restore: [archiver] unsupported version (1.12) in file header
--> unable to restore using older pg_restore

[kaigai(at)saba pg_dump]$ ./pg_restore --version
pg_restore (PostgreSQL) 9.0devel
[kaigai(at)saba pg_dump]$ ./pg_restore -d test ~/hoge.tar

[kaigai(at)saba pg_dump]$ psql test
psql (9.0devel)
Type "help" for help.

test=# \lo_list
Large objects
ID | Owner | Description
-------+--------+--------------------------------
16384 | kaigai |
16385 | ymj |
16386 | tak | this is a small large object
16387 | kaigai | this is a small large object 2
(4 rows)

test=# select oid, * from pg_largeobject_metadata;
oid | lomowner | lomacl
-------+----------+---------------------------------
16387 | 10 |
16384 | 10 | {kaigai=rw/kaigai,ymj=r/kaigai}
16385 | 16388 | {ymj=rw/ymj,tak=r/ymj}
16386 | 16389 | {tak=rw/tak,ymj=r/tak}
(4 rows)

Thanks,

(2010/02/18 9:12), KaiGai Kohei wrote:
> (2010/02/18 4:54), Tom Lane wrote:
>> I've been looking at the proposed patch for pg_dump with large objects,
>> and I'm afraid it breaks things even worse than they're broken in HEAD.
>>
>> Currently, when pg_restore is processing a BLOBS item, it emits
>> lo_create() followed by lo_write() calls, and conditionally precedes
>> that by lo_unlink() if --clean was specified. It pretty much has to
>> drive all that off the one archive entry since there are no others
>> (excepting BLOB COMMENTS which is unhelpful since it appears later).
>>
>> The patch wants to emit a separate BLOB item for each blob, upon which
>> we can hang ownership, ACL, and comment information in the same ways
>> that we deal with these things for other sorts of database objects.
>> That's good since it means that switches like --no-owner will work
>> properly. The trouble is that this means the responsibility to do
>> lo_unlink and lo_create has to be taken out from processing of
>> the BLOBS item.
>>
>> The way the patch does that is that it renames BLOBS to BLOB DATA,
>> and drives the do-we-create decision off the name of the archive
>> entry. This breaks compatibility of the archive to prior pg_restore
>> versions. An 8.4 pg_restore will have no idea that it doesn't know
>> how to process the archive, but nonetheless will emit wrong results
>> --- in particular, you get a lo_create() from the BLOB item and then
>> another from BLOB DATA, so the restore fails completely. There are
>> other bad effects too because 8.4 doesn't really know quite what
>> BLOB DATA is, but it needs to because there are various places that
>> have to treat that specially.
>>
>> Probably the only way we can make this design work is to bump the
>> archive version number so that older pg_restores will fail. (Whereupon
>> there is no need to rename the entry type BTW.) This is slightly
>> annoying but it's not like we've not done it multiple times before.
>
> IIRC, Itagaki-san also suggested to abort when we try to restore
> the dumped database image to the older pgsql. So, the checks in
> the DropBlobIfExists() are modified.
> This idea allows to abort restoring at the head of pg_restore.
> It seems to me fair enough.
>
>> If we wanted to keep backwards compatibility, we'd have to leave
>> the lo_create responsibility with the BLOBS item, and have the
>> BLOB metadata items be things that just add ACLs/ownership/comments
>> without doing the actual create, and have to be processed after
>> BLOBS instead of before it. This is probably workable but it
>> doesn't seem to me that it's accomplishing the goal of making blobs
>> work like normal objects.
>
> It was also an idea that I tried to implement in this approach at first.
> But it is difficult to treat various kind of pg_restore options correctly.
>
> Thanks,

--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-fix-pg_dump-blob-privs.8.patch application/octect-stream 26.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-02-18 02:27:17 Re: CommitFest Status Summary - 2010-02-14
Previous Message Fujii Masao 2010-02-18 01:40:31 Re: Streaming replication on win32, still broken