Re: Largeobject Access Controls (r2460)

From: KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Largeobject Access Controls (r2460)
Date: 2010-01-23 07:39:17
Message-ID: 4B5AA7A5.5020206@kaigai.gr.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2010/01/23 5:12), Tom Lane wrote:
> KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> The attached patch is a revised version.
>
> I'm inclined to wonder whether this patch doesn't prove that we've
> reached the end of the line for the current representation of blobs
> in pg_dump archives. The alternative that I'm thinking about is to
> treat each blob as an independent object (hence, with its own TOC
> entry). If we did that, then the standard pg_dump mechanisms for
> ownership, ACLs, and comments would apply, and we could get rid of
> the messy hacks that this patch is just adding to. That would also
> open the door to future improvements like being able to selectively
> restore blobs. (Actually you could do it immediately if you didn't
> mind editing a -l file...) And it would for instance allow loading
> of blobs to be parallelized.

I also think it is better approach than the current blob representation.

> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs. However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either. If your large objects are actually
> large, there's not going to be an enormous number of them. We've
> heard of people with many tens of thousands of tables, and pg_dump
> speed didn't seem to be a huge bottleneck for them (at least not
> in recent versions). So I'm feeling we should not dismiss the
> idea of one TOC entry per blob.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.
If we have one TOC entry per blob, the amount of total i/o size between
server and pg_dump is not different from the current approach.

If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of
RAM for 7 million BLOBs.
In the recent computers, is it unacceptable pain?
If you try to dump TB class database, I'd like to assume the machine
where pg_dump runs has adequate storage and RAM.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2010-01-23 07:43:27 Re: review: More frame options in window functions
Previous Message KaiGai Kohei 2010-01-23 07:17:19 Re: restructuring "alter table" privilege checks (was: remove redundant ownership checks)