Re: Make TOAST_TUPLES_PER_PAGE configurable per table.

From: Jesper Krogh <jesper(at)krogh(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Make TOAST_TUPLES_PER_PAGE configurable per table.
Date: 2010-02-02 06:03:35
Message-ID: 4B67C037.8080405@krogh.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Jesper Krogh <jesper(at)krogh(dot)cc> writes:
>> This patch enables users to set TOAST_TUPLES_PER_PAGE with
>> ALTER TABLE <table> SET (tuples_per_page = X); .. currently with 1 <= X
>> <= 32;
>
> It's not clear to me that fiddling with that is useful unless the toast
> tuple size also changes; and unfortunately changing that is much harder,
> because it's wired into the addressing of toast data. See also these
> notes:
>
> * XXX while these can be modified without initdb, some thought needs to be
> * given to needs_toast_table() in toasting.c before unleashing random
> * changes. Also see LOBLKSIZE in large_object.h, which can *not* be
> * changed without initdb.

I can see that needs_toast_table() might need some changes since it also
uses TUPLE_TOAST_THRESHOLD, and might benefit from being aware of a
toast table is triggered.

There might be more benefits with changes the toast tuple size (I dont
have enought insight to see that), but even without it I can get a
speedup of x10 on a "simple test" and permanently get the system to used
the caching for "more commonly used data" than these attributes that are
rarely used.

Ultimately I would like an infinite amount of configurabillity since I
have tables that only consists of simple values were 50% is really
rarely used and 50% is very often used. But just changing the
TOAST_TUPLE_PER_PAGE as above can easily increase my "tuple-density"
from 6/page to 40-60/page, which translates directly into:
* Less data to read when accessing the tuples.
* Less data to cache that is rarely used.

Where as on the the table with simple values I might at best be able to
double the tuple-density.

But yes it isn't a silverbullet, it requires knowledge of the access
patterns of the data.

What kind of arguments/tests/benchmarks is required to push for the
usefulness of "fiddling" with this parameter?

Realworld database in our environment has:
12M rows sitting with an average text length of ~2KB directly
"toastable" set is: 5GB which is really rarely used, but the webapp is
doing random reads for the presense/counts of these rows.
another table has ~700M rows sitting of a size of 135GB where around
120GB is of the "really rarely used type". (but takes time to compute so
it makes sense "wasting dead disk" on them).

So based on the benchmark provided in email I think that it can
significantly change the ration of cache hit/misses for the application.
(which has 64GB of dedicated memory).

Jesper
--
Jesper

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Boszormenyi Zoltan 2010-02-02 09:03:21 Re: [PATCH] Provide rowcount for utility SELECTs
Previous Message Takahiro Itagaki 2010-02-02 05:28:32 Re: New VACUUM FULL crashes on temp relations