Re: Largeobject Access Controls (r2460)

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Largeobject Access Controls (r2460)
Date: 2010-02-01 05:19:16
Message-ID: 20100201141915.BA5F.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:

> The attached patch uses one TOC entry for each blob objects.

This patch does not only fix the existing bugs, but also refactor
the dump format of large objects in pg_dump. The new format are
more similar to the format of tables:

Section <Tables> <New LO> <Old LO>
-----------------------------------------------------
Schema "TABLE" "BLOB ITEM" "BLOBS"
Data "TABLE DATA" "BLOB DATA" "BLOBS"
Comments "COMMENT" "COMMENT" "BLOB COMMENTS"

We will allocate BlobInfo in memory for each large object. It might
consume much more memory compared with former versions if we have
many large objects, but we discussed it is an acceptable change.

As far as I read, the patch is almost ready to commit
except the following issue about backward compatibility:

> * "BLOB DATA"
> This section is same as existing "BLOBS" section, except for _LoadBlobs()
> does not create a new large object before opening it with INV_WRITE, and
> lo_truncate() will be used instead of lo_unlink() when --clean is given.

> The legacy sections ("BLOBS" and "BLOB COMMENTS") are available to read
> for compatibility, but newer pg_dump never create these sections.

I wonder we need to support older versions in pg_restore. You add a check
"PQserverVersion >= 80500" in CleanupBlobIfExists(), but out documentation
says we cannot use pg_restore 9.0 for 8.4 or older servers:

http://developer.postgresql.org/pgdocs/postgres/app-pgdump.html
| it is not guaranteed that pg_dump's output can be loaded into
| a server of an older major version

Can we remove such path and raise an error instead?
Also, even if we support the older servers in the routine,
the new bytea format will be another problem anyway.

> One remained issue is the way to decide whether blobs to be dumped, or not.
> Right now, --schema-only eliminate all the blob dumps.
> However, I think it should follow the manner in any other object classes.
>
> -a, --data-only ... only "BLOB DATA" sections, not "BLOB ITEM"
> -s, --schema-only ... only "BLOB ITEM" sections, not "BLOB DATA"
> -b, --blobs ... both of "BLOB ITEM" and "BLOB DATA" independently
> from --data-only and --schema-only?

I cannot image situations that require data-only dumps -- for example,
restoring database has a filled pg_largeobject_metadata and an empty
or broken pg_largeobject -- but it seems to be unnatural.

I'd prefer to keep the existing behavior:
* default or data-only : dump all attributes and data of blobs
* schema-only : don't dump any blobs
and have independent options to control blob dumps:
* -b, --blobs : dump all blobs even if schema-only
* -B, --no-blobs : [NEW] don't dump any blobs even if default or data-only

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2010-02-01 07:10:16 Re: Largeobject Access Controls (r2460)
Previous Message Pavel Stehule 2010-02-01 05:07:52 Re: Review: listagg aggregate