Re: Compressed binary field

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Edson Richter <edsonrichter(at)hotmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Compressed binary field
Date: 2012-09-17 03:17:54
Message-ID: CAMkU=1yENbW+cjA-L8Najitv=E-7Bqa4re1Uuamujcgd+OTpNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Sep 11, 2012 at 9:34 AM, Edson Richter <edsonrichter(at)hotmail(dot)com> wrote:
>
> No, there is no problem. Just trying to reduce database size forcing these
> fields to compress.
> Actual database size = 8Gb
> Backup size = 1.6Gb (5x smaller)
>
> Seems to me (IMHO) that there is room for improvement in database storage
> (we don't have many indexes, and biggest tables are just the ones with bytea
> fields). That's why I've asked for experts counseling.

There are two things to keep in mind. One is that each datum is
compressed separately, so that a lot of redundancy that occurs between
fields of different tuples, but not within any given tuple, will not
be available to TOAST, but will be available to the compression of a
dump file.

Another thing is that PG's TOAST compression was designed to be simple
and fast and patent free, and often it is not all that good. It is
quite good if you have long stretches of repeats of a single
character, or exact densely spaced repeats of a sequence of characters
("123123123123123..."), but when the redundancy is less simple it does
a much worse job than gzip, for example, does.

It is possible but unlikely there is a bug somewhere, but most likely
your documents just aren't very compressible using pglz_compress.

Cheers,

Jeff

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Craig Ringer 2012-09-17 03:56:27 Re: Index creation takes more time?
Previous Message Sébastien Lorion 2012-09-16 15:29:36 Re: Amazon High I/O instances