Re: update on TOAST status'

From: JanWieck(at)t-online(dot)de (Jan Wieck)
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: update on TOAST status'
Date: 2000-07-07 11:30:15
Message-ID: 200007071130.NAA24645@hot.jw.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> JanWieck(at)t-online(dot)de (Jan Wieck) writes:
> > Tom Lane wrote:
> >> One simple answer that might help for other scenarios too is to keep
> >> a small cache of the last few values that had to be untoasted. Maybe
> >> we only need it for moved-off values --- it could be that decompression
> >> is fast enough that we should just do it over rather than trying to
> >> cache.
>
> > I'm still argueing that indexing huge values is a hint for a
> > misleading schema. If this is true, propagating toasted
> > attributes into indices is a dead end street and I'd have to
> > change the heap-access<->toaster interface so that the
> > modified (stored) main tuple isn't visible to the following
> > code (that does the index inserts).
>
> But you'll notice that is *not* what I suggested. A detoasted-value

Haven't missed it in the first read - of course.

> cache could be useful in more situations than just an index lookup.
> I don't necessarily say we've got to have it in 7.1, but let's keep
> the idea in mind in case we start finding there is a bottleneck here.
>
> > What is the value of supporting index tuples >2K?
>
> If you're toasting the whole main tuple down to <2K, you might find
> yourself toasting individual fields that are a good bit less than
> that. So I don't think indexing a toasted value will be all that
> unusual.

Exactly that's why I'm asking if we wouldn't be better off by
limiting index tuples to (blocksize - overhead) / 4 and
allways store plain, untoasted values in indices.

I've asked now a couple of times "who really has the need for
indexing huge values"? All responses I got so far where of
the kind "would be nice if we support it" or "I don't like
such restrictions". But noone really said "I need it".

> But this is all speculation for now. Let's get it working bulletproof
> for 7.1, and then worry about speedups after we know they are needed.

Let me speculate too a little.

The experience I have up to now is that the saved time from
requiring less blocks in the buffer cache outweights the cost
of decompression. Especially with our algorithm, because it
is byte oriented (instead of huffman coding beeing based on a
bit stream), causing it to be extremely fast on
decompression. And the technique of moving off values from
the main heap causes the main tuples to be much smaller. As
long as the toasted values aren't used in qualification or
joining, only their references move around through the
various executor steps, and only those values that are part
of the final result set need to be fetched when sending them
to the client.

Given a limited amount of total memory available for one
running postmaster, we save alot of disk I/O and hold more
values in their compressed format in the shared buffers. With
the limit on total memory, the size of the buffer cache must
be lowered by the size of the new detoasted cache, and that
only if we make it shared too. Given further an average of
50% compression ratio (what's not unlikely with typical input
like html pages), one cached detoasted value would require
two compressed ones to go away.

Wouldn't really surprise me if we gain speed from it in the
average query. Even if some operations might slow down
(sorting on maybe toasted fields).

We need to see some results and wait for reports for this.
But we know already that it can cause trouble with indexed
fields, because these are likely to be used for comparision
during scans. So do we want to have indices storing plain
values allways and limit them in the index-tuple size or not?

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message eisentrp 2000-07-07 11:43:51 Re: Memo on coding practices: strcmp() does not yield bool
Previous Message Paul McGarry 2000-07-07 09:00:14 Re: [SQL] MAX() of 0 records.