Re: lztext and compression ratios...

From: Hannu Krosing <hannu(at)tm(dot)ee>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: lztext and compression ratios...
Date: 2000-07-13 07:39:36
Message-ID: 396D7238.E8C5A793@tm.ee
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers pgsql-sql

Tom Lane wrote:
>
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> > Some quick numbers though:
> > I simply stripped down pg_lzcompress.c to call compress2()
> > and uncompress() instead of doing anything itself (what a
> > nice, small source file :-).
>
> I went at it in a different way: pulled out pg_lzcompress into a
> standalone source program that could also call zlib. These numbers
> represent the pure compression or decompression time for memory-to-
> memory processing, no other overhead at all. Each run was iterated
> 1000 times to make it long enough to time accurately (so you can
> read the times as "milliseconds per operation", though they're
> really seconds).

We could just make this part extensible as well, like the rest of
postgres, so we would have directory tree like

/compressors
/nullcompressor
/lzcompress
/zlib
/lzo
/bzip2
/my_new_supercompressor
/classic_huffman_for_uppercase_american_english

and select the desired compressor at compilt time, or even better, on
field by field basis at runtime, so that field that stores mainly
tar.gz-s at compression level 9 will use nullcompressor, and others
will use what is best for them.

>
> > Fix it's history allocation for huge values and have someone
> > (PgSQL Inc.?) patenting the compression algorithm, so we're
> > safe at some point in the future.
>
> That would be a really *bad* idea. What will people say if we say
> "Postgres contains patented algorithms, but we'll let you use them
> for free" ? They'll say "no thanks, I remember Unisys' repeatedly
> broken promises about the GIF patent" and stay away in droves.
> There is a *lot* of bad blood in the air about compression patents
> of any sort. We mustn't risk tainting Postgres' reputation with
> that mess.
> (In any case, one would hope you couldn't get a patent on this
> method, though I suppose it never pays to overestimate the competence
> of the USPTO...)

And AFAIK (IANAL ;) you can only patent previously _unpublished_ work,
even by the patent applicant.

>
> > If there's a patent problem
> > in it, we are already running the risk to get sued, the PGLZ
> > code got shipped with 7.0, used in lztext.
>
> But it hasn't been documented or advertised. If we take it out
> again in 7.1, I think our exposure to potential lawsuits from it is
> negligible. Not that I think there is any big risk there anyway,
> but we ought to consider the possibility.
>
> My feeling is that going with zlib is probably the right choice.
> The technical case for using a homebrew compressor instead isn't
> very compelling,

Speed seems to be a good reason, if we can keep it up.

> and the advantages of using a standardized,
> known-patent-free library are not to be ignored.

OTOH, there are possibly patents on other part of postgres,
like indexing, storage methods, the mere fact that something is
stored in another relation, using 'Z' as a protocol character, etc.
etc. So using a patent-free compression library does not help much.

So if PgSQL Inc. has lots of lawyers with nothing to do, they could do
some patent research and scare all developers away with their findings
;)

-------------
Hannu

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alex Bolenok 2000-07-13 08:39:43 WinZEOS components and CREATE USER
Previous Message Tom Lane 2000-07-13 07:34:20 Re: select for update not locking properly.

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas SB 2000-07-13 07:45:04 AW: lztext and compression ratios...
Previous Message Stephan Szabo 2000-07-13 07:31:38 Questions relating to "modified while in use" messages

Browse pgsql-sql by date

  From Date Subject
Next Message Karel Zak 2000-07-13 08:56:32 Re: Bug in to_char()
Previous Message Tom Lane 2000-07-13 07:07:57 Re: lztext and compression ratios...