Re: problems with large objects dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sergio Gabriel Rodriguez <sgrodriguez(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: problems with large objects dump
Date: 2012-10-13 01:31:54
Message-ID: 27767.1350091914@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I wrote:
> Sergio Gabriel Rodriguez <sgrodriguez(at)gmail(dot)com> writes:
>> I never use oprofile, but for a few hours into the process, I could take
>> this report:
>> 1202449 56.5535 sortDumpableObjects

> Hm. I suspect a lot of that has to do with the large objects; and it's
> really overkill to treat them as full-fledged objects since they never
> have unique dependencies. This wasn't a problem when commit
> c0d5be5d6a736d2ee8141e920bc3de8e001bf6d9 went in, but I think now it
> might be because of the additional constraints added in commit
> a1ef01fe163b304760088e3e30eb22036910a495. I wonder if it's time to try
> to optimize pg_dump's handling of blobs a bit better. But still, any
> such fix probably wouldn't make a huge difference for you. Most of the
> time is going into pushing the blob data around, I think.

For fun, I tried adding 5 million empty blobs to the standard regression
database, and then did a pg_dump. It took a bit under 9 minutes on my
workstation. oprofile showed about 32% of pg_dump's runtime going into
sortDumpableObjects, which might make you think that's worth optimizing
... until you look at the bigger picture system-wide:

samples| %|
------------------
727394 59.4098 kernel
264874 21.6336 postgres
136734 11.1677 /lib64/libc-2.14.90.so
39878 3.2570 pg_dump
37025 3.0240 libpq.so.5.6
17964 1.4672 /usr/bin/wc
354 0.0289 /usr/bin/oprofiled

So actually sortDumpableObjects took only about 1% of the CPU cycles.
And remember this is with empty objects. If we'd been shoving 200GB of
data through the dump, the data pipeline would surely have swamped all
else.

So I think the original assumption that we didn't need to optimize
pg_dump's object management infrastructure for blobs still holds good.
If there's anything that is worth fixing here, it's the number of server
roundtrips being used ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2012-10-13 01:34:08 Re: Do cast affects index usage?
Previous Message Anibal David Acosta 2012-10-13 00:27:42 Re: Do cast affects index usage?