First what the current implementation and the yet to be done
All varlena data types (text, char, varchar, arrays) will
finally be toastable. Every table that uses such types
will have a secondary relation to move off attributes.
The toaster allways tries to keep a main tuple small
enough so that at minimum 4 tuples fit into a block. One
had complained about, and I explain later why I think
it's a good decision anyway.
This strategy already covers most possible index problems. If
the main tuple fits into 2K after toasting, any combination
of attributes out of it will too. The only thing not covered
are functional indices.
In real world scenarios, indices are usually set up on small
key values. These are very likely to be kept plain in the
main tuple by the toaster, becuase it looks at the biggest
values first. So an index (built out of the values in the
main tuple after toasting) will also contain the plain
values. Thus, index scans will not require toast fetches in
turn. Except the indexed attribute had at some point a huge
The current TOAST implementation hooks into the heap access
methods only. Automagically covering the index issues due to
the 2K approach. Fact is, that if more toast entries can get
produced during index inserts, we need to take care for them
during vacuum (the only place where index items get removed).
Alot of work just to support huge functional indices - IMHO
not worth the efford right now. Let's better get some
experience with the entire thing before going too far.
Why is it good to keep the main tuple below 2K? First because
of the above side effects for indices. Second, because in the
most likely case of small indexed attributes, more main
tuples (that must be fetched for the visibility checks
anyway) will fit into one block. That'll cause more blocks of
the relation to fit into the given shared memory buffer cache
and avoids I/O during index scans.
My latest tests load a 1.1M tree full of .html files into a
database. The result is a 140K heap plus 300K toast
relation. Without that 2K approach, the result is a 640K heap
plus 90K toastrel only. Since all compression is done on
single entries, it scales linear, so that a 1.1G tree will
result in a 140M heap plus 300M toastrel vs. a 640M heap plus
90M toastrel. No need to bechmark it - I know which strategy
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #
pgsql-hackers by date
|Next:||From: Karel Zak||Date: 2000-07-04 18:54:44|
|Subject: Re: sequential test error|
|Previous:||From: steven wu||Date: 2000-07-04 18:39:40|
|Subject: sequential test error|