Re: more about pg_toast growth

From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Jan Wieck <janwieck(at)yahoo(dot)com>
Cc: Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: more about pg_toast growth
Date: 2002-03-13 20:35:55
Message-ID: 1016051755.5255.26.camel@heat
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, 2002-03-13 at 12:16, Jan Wieck wrote:
> Jeffrey W. Baker wrote:
> > On Wed, 2002-03-13 at 07:22, Jan Wieck wrote:
> > > [...]
> > >
> > > Remember, TOAST doesn't only come in slices, don't you
> > > usually brown it? Meaning, the data gets compressed (with a
> > > lousy but really fast algorithm). What kind of data is
> > > resp_body? 50% compression ratio ... I guess it's html,
> > > right?
> >
> > It is gzipped and base64-encoded text. It's somewhat strange that a
> > fast LZ would deflate it very much, but I guess it must be an artifact
> > of the base64. The initial gzip tends to deflate the data by about 90%.
>
> Now THAT is very surprising to me! The SLZ algorithm used in
> TOAST will for sure not be able to squeeze anything out of a
> gzip compressed stream. The result would be bigger again.
> B64 changes the file size basically to 4/3rd, but since the
> input stream is gzipped, the resulting B64 stream shouldn't
> contain patterns that SLZ can use to reduce the size again.
>
> Are you sure you're B64-encoding the gzipped text?

I am positive:

rupert=# select substr(body, 0, 200) from resp_body where resp = (select
max(resp) from resp_body);

eJztfXt34riy799hrf4OGuZMJ1k3BL949SScRQhJmCbAAbp7z75zV5bAAjxtbI5tkjB75rvfkiwb
GxxDHt0dgvtBjC2VpFLVr6qkknMydiZ6+WRMsFo+6dV7jVqZnOE5ami2oxkjG31ALWdMLLgxIIZN
UFvHDrFPsm7Z1MmEOBiNHWeaIf87025P07X7qWYRO40Gp

rupert=# select min(length(body)), max(length(body)), avg(length(body))
from resp_body;
min | max | avg
-----+--------+------------------
0 | 261948 | 21529.5282897281

> I mean,
> you have an average body size of 23K "gzipped", so you're
> telling that the average uncompressed body size is about
> 230K? You are storing 230 Megabytes of raw body data per
> hour? Man, who is writing all that text?

Reuters.

I have increased the free space map and will be able to restart the
postmaster today at around midnight GMT.

Thanks for you help,
Jeffrey

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-03-13 20:51:43 Re: index on large table
Previous Message Bruce Momjian 2002-03-13 20:29:37 Re: checkpoint