Re: more about pg_toast growth

From: Jan Wieck <janwieck(at)yahoo(dot)com>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: Jan Wieck <janwieck(at)yahoo(dot)com>, Postgres general mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: more about pg_toast growth
Date: 2002-03-13 20:16:13
Message-ID: 200203132016.g2DKGDr32370@saturn.janwieck.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Jeffrey W. Baker wrote:
> On Wed, 2002-03-13 at 07:22, Jan Wieck wrote:
> > [...]
> >
> > Remember, TOAST doesn't only come in slices, don't you
> > usually brown it? Meaning, the data gets compressed (with a
> > lousy but really fast algorithm). What kind of data is
> > resp_body? 50% compression ratio ... I guess it's html,
> > right?
>
> It is gzipped and base64-encoded text. It's somewhat strange that a
> fast LZ would deflate it very much, but I guess it must be an artifact
> of the base64. The initial gzip tends to deflate the data by about 90%.

Now THAT is very surprising to me! The SLZ algorithm used in
TOAST will for sure not be able to squeeze anything out of a
gzip compressed stream. The result would be bigger again.
B64 changes the file size basically to 4/3rd, but since the
input stream is gzipped, the resulting B64 stream shouldn't
contain patterns that SLZ can use to reduce the size again.

Are you sure you're B64-encoding the gzipped text? I mean,
you have an average body size of 23K "gzipped", so you're
telling that the average uncompressed body size is about
230K? You are storing 230 Megabytes of raw body data per
hour? Man, who is writing all that text?

>
> > Anyway, I would suggest you increase the max_fsm_pages
> > parameter. Commented out parameters in the postgresql.conf
> > file means "default". You said you're doing about 1,000
> > inserts an hour and a daily bulk delete of approx. 24,000.
> > Assuming most of the toast tuples are contigous, that'd mean
> > you are freeing something like 35,000 toast pages. I would
> > suggest a freespace map size of 50,000 pages, to start with.
> > That should at least lower the growth rate. If you still see
> > growth, go ahead and increase it further.
>
> I will happily do so. What is the argument against increasing the free
> space map? Does it consume more memory? The machine has 4GB main
> memory, and I wouln't notice increased consumption of a few megabytes.

It allocates some more shared memory. It's surely in the
range of a few megabytes, so no need to worry in this case.

> What triggers storage in the toast tables? Is it because of a tuple
> longer than some threshold?

It is triggered by the attempt to store a row bigger than 1/4
of the blocksize. Blocksize defaults to 8K. The toaster tries
to shrink down the row size by first compressing attribute by
attribute, then by moving them out to the toast table. As
soon as the row fit's into the 2K it'll stop and the storage
happens.

The 1/4 blocksize is something I choose because it had the
best performance in my tests. The theory behind it is that
your key fields are likely to be the small ones that remain
uncompressed in the main-row. If more such main rows fit
into fewer blocks, you have better caching of key attributes,
thus faster scans and joins. And access to the huge values is
only done after the final result set is collected, what
usually becomes a smaller percentage of the entire data, the
larger the data is. So it all reduces I/O.

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2002-03-13 20:23:55 Re: Query planner problem
Previous Message Bruce Momjian 2002-03-13 20:03:15 Re: Nested Transactions