Re: directory archive format for pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-12-02 02:35:48
Message-ID: AANLkTinauQsCGt-yeyeN_QWuN+tao+q126rwKnO2QWLP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 1, 2010 at 9:05 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Forgot attachment. This is also available in the above git repo.

I have quickly checked your modifications, on the one hand I like the
reduction of functions, I would have said that we have AH around all
the time and so we could just allocate once and stuff it all into
ctx->cs and reuse the buffers for every object, but re-allocating them
for every (dumpable) object should be fine as well.

Regarding the function pointers that you removed, you are now putting
back in what Dimitri wanted me to take out, namely switch/case
instructions for the algorithms and then #ifdefs for every algorithm.
It's not too many now since we have taken out LZF. Well, I can live
with both ways.

There is one thing however that I am not in favor of, which is the
removal of the "sizeHint" parameter for the read functions. The reason
for this parameter is not very clear now without LZF but I have tried
to put in a few comments to explain the situation (which you have
taken out as well :-) ).

The point is that zlib is a stream based compression algorithm, you
just stuff data in and from time to time you get data out and in the
end you explicitly flush the compressor. The read function can just
return as many bytes as it wants and we can just hand it all over to
zlib. Other compression algorithms however are block based and first
write a block header that contains the information on the next data
block, including uncompressed and compressed sizes. Now with the
sizeHint parameter I used, the compressor could tell the read function
that it just wants to read the fixed size header (6 bytes IIRC). In
the header it would look up the compressed size for the next block and
would then ask the read function to get exactly this amount of data,
decompress it and go on with the next block, and so forth...

Of course you can possibly do that memory management inside the
compressor with an extra buffer holding what you got in excess but
it's a pain. If you removed that part on purpose on the grounds that
there is no block based compression algorithm in core and probably
never will be, then that's okay :-)

Joachim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-12-02 02:38:08 Re: Proposal: First step towards Intelligent, integrateddatabase
Previous Message Jeff Davis 2010-12-02 02:27:59 build problem