Re: what would tar file FDW look like?

From: Bear Giles <bgiles(at)coyotesong(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: what would tar file FDW look like?
Date: 2015-08-17 15:36:02
Message-ID: CALBNtw4dpNi9YJksmrju6S8BZLy-vd=kpvK7K3fH4pxqM1r9aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've written readers for both from scratch. Tar isn't that bad since it's
blocked - you read the header, skip forward N blocks, continue. The hardest
part is setting up the decompression libraries if you want to support
tar.gz or tar.bz2 files.

Zip files are more complex. You have (iirc) 5 control blocks - start of
archive, start of file, end of file, start of index, end of archive, and
the information in the control block is pretty limited. That's not a huge
burden since there's support for extensions for things like the unix file
metadata. One complication is that you need to support compression from the
start.

Zip files support two types of encryption. There's a really weak version
that almost nobody supports and a much stronger modern version that's
subject to license restrictions. (Some people use the weak version on
embedded systems because of legal requirements to /do something/, no matter
how lame.)

There are third-party libraries, of course, but that introduces
dependencies. Both formats are simple enough to write from scratch.

I guess my bigger question is if there's an interest in either or both for
"real" use. I'm doing this as an exercise but am willing to contrib the
code if there's a general interest in it.

(BTW the more complex object I'm working on is the .p12 keystore for
digital certificates and private keys. We have everything we need in the
openssl library so there's no additional third-party dependencies. I have a
minimal FDW for the digital certificate itself and am now working on a way
to access keys stored in a standard format on the filesystem instead of in
the database itself. A natural fit is a specialized archive FDW. Unlike tar
and zip it will have two payloads, the digital certificate and the
(optionally encrypted) private key. It has searchable metadata, e.g.,
finding all records with a specific subject.)

Bear

On Mon, Aug 17, 2015 at 8:29 AM, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Mon, Aug 17, 2015 at 3:14 PM, Bear Giles <bgiles(at)coyotesong(dot)com> wrote:
> > I'm starting to work on a tar FDW as a proxy for a much more specific
> FDW.
> > (It's the 'faster to build two and toss the first away' approach - tar
> lets
> > me get the FDW stuff nailed down before attacking the more complex
> > container.) It could also be useful in its own right, or as the basis
> for a
> > zip file FDW.
>
> Hm. tar may be a bad fit where zip may be much easier. Tar has no
> index or table of contents. You have to scan the entire file to find
> all the members. IIRC Zip does have a table of contents at the end of
> the file.
>
> The most efficient way to process a tar file is to describe exactly
> what you want to happen with each member and then process it linearly
> from start to end (or until you've found the members you're looking
> for). Trying to return meta info and then go looking for individual
> members will be quite slow and have a large startup cost.
>
>
> --
> greg
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Neil Conway 2015-08-17 16:22:08 Re: Memory allocation in spi_printtup()
Previous Message Andres Freund 2015-08-17 15:13:06 Re: checkpointer continuous flushing