Re: directory archive format for pg_dump

From: Joachim Wieland <joe(at)mcknight(dot)de>
To: José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: directory archive format for pg_dump
Date: 2010-11-20 04:10:37
Message-ID: AANLkTik+PR2HCsqS7PVAdETPyKSGn2hb-eTeg9QZpRew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jose,

2010/11/19 José Arthur Benetasso Villanova <jose(dot)arthur(at)gmail(dot)com>:
> The dir format generated in my database 60 files, with different
> sizes, and it looks very confusing. Is it possible to use the same
> trick as pigz and pbzip2, creating a concatenated file of streams?

What pigz is parallelizing is the actual computation of the compressed
data. The directory archive format however is a preparation for a
parallel pg_dump, dumping several tables (especially large tables of
course) in parallel via multiple database connections and multiple
pg_dump frontends. The idea of multiplexing their output into one file
has been rejected on the grounds that it would probably slow down the
whole process.

Nevertheless pigz could be implemented as an alternative compression
algorithm and that way the custom and the directory archive format
could use it, but here as well, license and patent questions might be
in the way, even though it is based on libz.

> The md5.c and kwlookup.c reuse using a link doesn't look nice either.
> This way you need to compile twice, among others things, but I think
> that its temporary, right?

No, it isn't. md5.c is used in the same way by e.g. libpq and there
are other examples for links in core, check out src/bin/psql for
example.

Joachim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-11-20 04:54:32 Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
Previous Message Vaibhav Kaushal 2010-11-20 03:51:33 Fwd: What do these terms mean in the SOURCE CODE?