Re: bytea vs. pg_dump

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bernd Helmle <mailings(at)oopsware(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bytea vs. pg_dump
Date: 2009-05-29 01:26:35
Message-ID: 200905290126.n4T1QZP01411@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

|Improve bytea COPY format
* http://archives.postgresql.org/pgsql-hackers/2009-05/msg00192.php

---------------------------------------------------------------------------

Merlin Moncure wrote:
> On Sat, May 16, 2009 at 11:23 AM, Stefan Kaltenbrunner
> <stefan(at)kaltenbrunner(dot)cc> wrote:
> > Bernd Helmle wrote:
> >>
> >> --On Mittwoch, Mai 06, 2009 19:04:21 -0400 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> >> wrote:
> >>
> >>> So I'm now persuaded that a better textual representation for bytea
> >>> should indeed make things noticeably better here. ?It would be
> >>> useful though to cross-check this thought by profiling a case that
> >>> dumps a comparable volume of text data that contains no backslashes...
> >>
> >> This is a profiling result of the same data converted into a printable
> >> text format without any backslashes. The data amount is quite the same and
> >> as you already guessed, calls to appendBinaryStringInfo() and friends gives
> >> the expected numbers:
> >>
> >>
> >> time ? seconds ? seconds ? ?calls ? s/call ? s/call ?name
> >> 35.13 ? ? 24.67 ? ?24.67 ? 134488 ? ? 0.00 ? ? 0.00 ?byteaout
> >> 32.61 ? ? 47.57 ? ?22.90 ? 134488 ? ? 0.00 ? ? 0.00 ?CopyOneRowTo
> >> 28.92 ? ? 67.88 ? ?20.31 ? ?85967 ? ? 0.00 ? ? 0.00 ?pglz_decompress
> >> ?0.67 ? ? 68.35 ? ? 0.47 ?4955300 ? ? 0.00 ? ? 0.00
> >> hash_search_with_hash_value
> >> ?0.28 ? ? 68.55 ? ? 0.20 11643046 ? ? 0.00 ? ? 0.00 ?LWLockRelease
> >> ?0.28 ? ? 68.75 ? ? 0.20 ?4828896 ? ? 0.00 ? ? 0.00 ?index_getnext
> >> ?0.24 ? ? 68.92 ? ? 0.17 ?1208577 ? ? 0.00 ? ? 0.00 ?StrategyGetBuffer
> >> ?0.23 ? ? 69.08 ? ? 0.16 11643046 ? ? 0.00 ? ? 0.00 ?LWLockAcquire
> >> ...
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134498 ? ? 0.00 ? ? 0.00 ?enlargeStringInfo
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134497 ? ? 0.00 ? ? 0.00
> >> ?appendBinaryStringInfo
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134490 ? ? 0.00 ? ? 0.00 ?AllocSetReset
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134490 ? ? 0.00 ? ? 0.00 ?resetStringInfo
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134488 ? ? 0.00 ? ? 0.00 ?CopySendChar
> >> ?0.00 ? ? 70.23 ? ? 0.00 ? 134488 ? ? 0.00 ? ? 0.00 ?CopySendEndOfRow
> >
> >
> > while doing some pg_migrator testing I noticed that dumping a database seems
> > to be much slower than IO-system is capable off. ie i get 100% CPU usage
> > with no IO-wait at all with between 15-30MB/s read rate if i say do a
> > pg_dumpall > /dev/null.
>
> Part of the problem is the decompression. Can't do much about that
> except to not compress your data.
>
> I don't have any hard statistics on hand at the moment, but a while
> back we compared 'COPY' vs a hand written SPI routine that got the
> tuple data in binary and streamed it out field by field raw to a file.
> The speed difference was enormous..I don't recall the exact
> difference but copy was at least 2x slower. This seems to suggest
> there are many potential improvements to copy (my test was mainly
> bytea as well).
>
> merlin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-05-29 01:30:14 explain analyze rows=%.0f
Previous Message Andrew Dunstan 2009-05-29 01:25:46 Re: pg_migrator and an 8.3-compatible tsvector data type