Skip site navigation (1) Skip section navigation (2)

Re: Largeobject Access Controls (r2460)

From: KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Largeobject Access Controls (r2460)
Date: 2010-01-23 07:39:17
Message-ID: 4B5AA7A5.5020206@kaigai.gr.jp (view raw or flat)
Thread:
Lists: pgsql-hackers
(2010/01/23 5:12), Tom Lane wrote:
> KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com>  writes:
>> The attached patch is a revised version.
> 
> I'm inclined to wonder whether this patch doesn't prove that we've
> reached the end of the line for the current representation of blobs
> in pg_dump archives.  The alternative that I'm thinking about is to
> treat each blob as an independent object (hence, with its own TOC
> entry).  If we did that, then the standard pg_dump mechanisms for
> ownership, ACLs, and comments would apply, and we could get rid of
> the messy hacks that this patch is just adding to.  That would also
> open the door to future improvements like being able to selectively
> restore blobs.  (Actually you could do it immediately if you didn't
> mind editing a -l file...)  And it would for instance allow loading
> of blobs to be parallelized.

I also think it is better approach than the current blob representation.

> Now the argument against that is that it won't scale terribly well
> to situations with very large numbers of blobs.  However, I'm not
> convinced that the current approach of cramming them all into one
> TOC entry scales so well either.  If your large objects are actually
> large, there's not going to be an enormous number of them.  We've
> heard of people with many tens of thousands of tables, and pg_dump
> speed didn't seem to be a huge bottleneck for them (at least not
> in recent versions).  So I'm feeling we should not dismiss the
> idea of one TOC entry per blob.

Even if the database contains massive number of large objects, all the
pg_dump has to manege on RAM is its metadata, not data contents.
If we have one TOC entry per blob, the amount of total i/o size between
server and pg_dump is not different from the current approach.

If we assume one TOC entry consume 64 bytes of RAM, it needs 450MB of
RAM for 7 million BLOBs.
In the recent computers, is it unacceptable pain?
If you try to dump TB class database, I'd like to assume the machine
where pg_dump runs has adequate storage and RAM.

Thanks,
-- 
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

pgsql-hackers by date

Next:From: Hitoshi HaradaDate: 2010-01-23 07:43:27
Subject: Re: review: More frame options in window functions
Previous:From: KaiGai KoheiDate: 2010-01-23 07:17:19
Subject: Re: restructuring "alter table" privilege checks (was: remove redundant ownership checks)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group