Re: Applying TOAST to CURRENT

From: JanWieck(at)t-online(dot)de (Jan Wieck)
To: The Hermit Hacker <scrappy(at)hub(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jan Wieck <JanWieck(at)yahoo(dot)com>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Applying TOAST to CURRENT
Date: 2000-05-31 01:10:20
Message-ID: 200005310110.DAA14672@hot.jw.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The Hermit Hacker wrote:
>
> have to third this one ... I think it should be totally transparent to the
> admin/user ... just create it when the table is created, what's the worst
> case scenario? it never gets used and you waste 16k of disk space?
>

Not exactly.

I've made some good experiences with having the toaster
trying to keep the main tuple size below 1/4 of MaxTupleSize
(BLKSIZE - block header). Remember that external stored
attributes are only fetched from the secondary relation if
really needed (when the result set is sent to the client or
if explicitly used in the query). So in a usual case, where a
relatively small amount of the entire data is retrieved and
key attributes are small, it's a win. With this config more
main tuples fit into one block, and if the attributes used in
the WHERE clause aren't stored external, the result set
(including sort and group actions) can be collected with
fewer block reads. Only those big values, that the client
really wanted, have to be fetched at send time.

If no external table exists, the toaster will try the <2K
thing by compression only. If the resulting tuple fits into
the 8K limit, it's OK. But if a secondary relation exists,
it'll store external to make the tuple <2K. Thus, a 4K or 6K
tuple, that actually fits and would be stored in the main
table, will cause the toaster to jump in if we allways create
the secondary table.

Hmmm - thinking about that it doesn't sound bad if we allways
create a secondary relation at CREATE TABLE time, but NOT the
index for it. And at VACUUM time we create the index if it
doesn't exist AND there is external stored data.

The table is prepared for external storage allways and we
avoid the risks from creating tables in possibly later
aborting transactions or due to concurrency issues. But we
don't waste the index space for really allways-small-tuple
tables.

Another benefit would be, that reloads should be faster
because with this technique, the toaster doesn't need to
insert index tuples during the load. The indices are created
later at VACUUM after reload.

The toaster needs to use sequential scans on the external
table until the next vacuum run, but index usage allways
depends on vacuum so that's not a real issue from my PoV.

At least a transparent compromise - isn't it?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 2000-05-31 01:33:16 Re: CVS log problem
Previous Message Tatsuo Ishii 2000-05-31 01:03:46 Re: CVS log problem