Re: pg_dump: largeobject behavior issues (possible bug)

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump: largeobject behavior issues (possible bug)
Date: 2015-04-25 14:57:30
Message-ID: 553BAB5A.9010707@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 04/24/2015 06:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>>> The relevant code is getBlobs in pg_dump.c, which queries the whole of
>>> pg_largeobject_metadata without using a cursor (so the PGresult is
>>> already huge thanks to having >100 million rows), and then mallocs a
>>> BlobInfo array and populates it from the PGresult, also using pg_strdup
>>> for the oid string, owner name, and ACL if any.
>> I'm surprised this hasn't come up before. I have a client that I
>> persuaded to convert all their LOs to bytea fields because of problems
>> with pg_dump handling millions of LOs, and kept them on an older
>> postgres version until they made that change.
> Yeah, this was brought up when we added per-large-object metadata; it was
> obvious that that patch would cause pg_dump to choke on large numbers of
> large objects. The (perhaps rather lame) argument was that you wouldn't
> have that many of them.
>
> Given that large objects don't have any individual dependencies,
> one could envision fixing this by replacing the individual large-object
> DumpableObjects by a single placeholder to participate in the sort phase,
> and then when it's time to dump that, scan the large objects using a
> cursor and create/print/delete the information separately for each one.
> This would likely involve some rather painful refactoring in pg_dump
> however.

I think we need to think about this some more, TBH, I'm not convinced
that the changes made back in 9.0 were well conceived. Having separate
TOC entries for each LO seems wrong in principle, although I understand
why it was done. For now, my advice would be to avoid use of
pg_dump/pg_restore if you have large numbers of LOs. The good news is
that these days there are alternative methods of doing backup / restore,
albeit not 100% equivalent with pg_dump / pg_restore.

One useful thing might be to provide pg_dump with
--no-blobs/--blobs-only switches so you could at least easily segregate
the blobs into their own dump file. That would be in addition to dealing
with the memory problems pg_dump has with millions of LOs, of course.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-04-25 16:32:22 Re: pg_dump: largeobject behavior issues (possible bug)
Previous Message Michael Paquier 2015-04-25 14:53:36 Re: forward vs backward slashes in msvc build code