Re: bytea vs. pg_dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bernd Helmle <mailings(at)oopsware(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bytea vs. pg_dump
Date: 2009-05-06 23:04:21
Message-ID: 116.1241651061@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bernd Helmle <mailings(at)oopsware(dot)de> writes:
> --On Dienstag, Mai 05, 2009 10:00:37 -0400 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> wrote:
>> Seems like the right response might be some micro-optimization effort on
>> byteaout.

> Hmm looking into profiler statistics seems to second your suspicion:

> Normal COPY shows:

> % cumulative self self total
> time seconds seconds calls s/call s/call name
> 31.29 81.38 81.38 134487 0.00 0.00 CopyOneRowTo
> 22.88 140.89 59.51 134487 0.00 0.00 byteaout
> 13.44 175.84 34.95 3052797224 0.00 0.00
> appendBinaryStringInfo
> 12.10 207.32 31.48 3052990837 0.00 0.00 CopySendChar
> 8.45 229.31 21.99 3052797226 0.00 0.00 enlargeStringInfo
> 3.90 239.45 10.14 55500 0.00 0.00 pglz_decompress

I hadn't looked closely at these numbers before, but now that I do,
what I think they are telling us is that the high proportion of
backslashes in standard bytea output is a real killer for COPY
performance. With no backslashes, CopySendChar wouldn't be in the
picture at all here, and appendBinaryStringInfo/enlargeStringInfo
would be called many fewer times (roughly 134487 not 3052797224)
with proportionately more characters processed per call. The inner
loop of CopyOneRowTo (I assume CopyAttributeOutText has been inlined
into that function) is relatively cheap for ordinary characters and
much less so for backslashes, so I bet that number would go down too.
And as already noted, byteaout itself works pretty hard to produce
the current representation.

So I'm now persuaded that a better textual representation for bytea
should indeed make things noticeably better here. It would be
useful though to cross-check this thought by profiling a case that
dumps a comparable volume of text data that contains no backslashes...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dickson S. Guedes 2009-05-07 02:23:41 WIP patch for TODO Item: Add prompt escape to display the client and server versions
Previous Message Tom Lane 2009-05-06 21:43:00 Re: conditional dropping of columns/constraints