Re: Large object dumps vs older pg_restore

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Large object dumps vs older pg_restore
Date: 2010-02-18 00:12:36
Message-ID: 4B7C85F4.9060408@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2010/02/18 4:54), Tom Lane wrote:
> I've been looking at the proposed patch for pg_dump with large objects,
> and I'm afraid it breaks things even worse than they're broken in HEAD.
>
> Currently, when pg_restore is processing a BLOBS item, it emits
> lo_create() followed by lo_write() calls, and conditionally precedes
> that by lo_unlink() if --clean was specified. It pretty much has to
> drive all that off the one archive entry since there are no others
> (excepting BLOB COMMENTS which is unhelpful since it appears later).
>
> The patch wants to emit a separate BLOB item for each blob, upon which
> we can hang ownership, ACL, and comment information in the same ways
> that we deal with these things for other sorts of database objects.
> That's good since it means that switches like --no-owner will work
> properly. The trouble is that this means the responsibility to do
> lo_unlink and lo_create has to be taken out from processing of
> the BLOBS item.
>
> The way the patch does that is that it renames BLOBS to BLOB DATA,
> and drives the do-we-create decision off the name of the archive
> entry. This breaks compatibility of the archive to prior pg_restore
> versions. An 8.4 pg_restore will have no idea that it doesn't know
> how to process the archive, but nonetheless will emit wrong results
> --- in particular, you get a lo_create() from the BLOB item and then
> another from BLOB DATA, so the restore fails completely. There are
> other bad effects too because 8.4 doesn't really know quite what
> BLOB DATA is, but it needs to because there are various places that
> have to treat that specially.
>
> Probably the only way we can make this design work is to bump the
> archive version number so that older pg_restores will fail. (Whereupon
> there is no need to rename the entry type BTW.) This is slightly
> annoying but it's not like we've not done it multiple times before.

IIRC, Itagaki-san also suggested to abort when we try to restore
the dumped database image to the older pgsql. So, the checks in
the DropBlobIfExists() are modified.
This idea allows to abort restoring at the head of pg_restore.
It seems to me fair enough.

> If we wanted to keep backwards compatibility, we'd have to leave
> the lo_create responsibility with the BLOBS item, and have the
> BLOB metadata items be things that just add ACLs/ownership/comments
> without doing the actual create, and have to be processed after
> BLOBS instead of before it. This is probably workable but it
> doesn't seem to me that it's accomplishing the goal of making blobs
> work like normal objects.

It was also an idea that I tried to implement in this approach at first.
But it is difficult to treat various kind of pg_restore options correctly.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-02-18 00:43:39 Re: NOTIFY/LISTEN on read-only slave?
Previous Message Rob Newton 2010-02-18 00:08:03 Re: [GENERAL] libecpg versions and libecpg_compat