A patch which implements HOT is attached with this email. This patch
applies cleanly to PostgresQL 8.2 BETA3 and passes all the regression
tests with two exceptions which are explained later. This patch has shown a
good performance gain when tested with pgbench, DBT2 and some other
benchmarking tests. This an early version of the patch and list of
known issues/bugs and outstanding items is included with this email.
We would appreciate all the review comments and experiences with the
What is HOT ?
HOT stands for Heap Overflow Tuple and this is an attempt to solve some
of the problems associated with frequently updated tables. This
design optimizies the updates when none of the index columns are
modified and length of the tuple remains the same after update. In this
particular case, the updated tuple is stored in a seperate overflow
relation and pulled-back into the main relation when the tuple in
the main relation becomes dead.
For a detail design document and related discussion please see the
pgsql-hackers mailing list.
How to Use HOT ?
A new WITH option for tables is required to take advantage of HOT:
CREATE TABLE test (a int) WITH (hot_update='true');
The default is set by an additional GUC parameter
"default_use_hot_update", which defaults to false.
This may not be supported in the final release, but it comes very handy
to test some existing application with HOT-update without changing any
of the table creation scripts.
An initdb is required after patch application.
* Correctness Testing:
- Regression tests
When tables are created with HOT-update turned OFF, all the regression
tests pass. The exceptions are the 'type_sanity' test which fails because
a new system attribute called "btid" is added and 'rules' test which fails
because of additional pg_stats columns added with the patch.
With the current behaviour, when a tuple is updated, the updated tuple
is inserted at the end of the table. So during a sequential scan, the
updated tuple is returned at the end. With HOT-updates, the updated
tuple goes into the overflow relation (and subsequently copied-back
to the root tuple). So in a sequential scan, the tuple is returned
early and in the same order before the update. This causes a difference
in the scan order for HOT-updated and normal tables. Because of the
same issue, few of the regression tests seem failing when HOT is
turned ON. These not really bugs. The only exeption is hash_index
test which fails because of a bug in the code and is described in
the section of Outstanding Bugs.
Since changes to the regression tests is a major topic of discussion, we
made a separate patch which fixes the regression tests (by adding ORDER
BY clause) for testing HOT-updates and we would post that as a seperate
- Concurrent psql tests
We implemented a concurrent version of psql to support multiple
sessions and transactions from the same psql prompt. We also wrote
several test cases to test the correctness of HOT-updates in various
serializable modes. All of these tests are passing fine. We should be
posting those tests and psql patch very soon.
* Performance Testing:
We ran performance benchmark tests using various open source
benchmarking tools such as pgbench and DBT2 and some specially developed
tests such as truckin which was posted to the mailing lists earlier.
We shall post the results of these tests to the list.
* Crash Recovery:
We have tested the crash recovery to some extent, but a thorough
testing is still required. Till now we have mostly tested the recovery
of pgbench running with large number of clients. Extensive crash
recovery testing is needed, especially with heavy concurrent
updates/selects/deletes and mix of HOT and non-HOT updates.
The following items don't work yet, but no problems are foreseen in
- Vacuuming copied-back tuple
When a tuple from the overflow relation is copied-back to the main
relation, the overflow tuple must not be vacuumed until there are
backends which might have references to the tuple. The tuple should be
vacuumed once its clear that the tuple is now unreachable. In the
current code, we set the xmax of the overflow tuple to the transaction
id of the transaction doing the copy-back operation. But this is clearly
a wrong thing to do. Need to fix this.
- Bitmap heapscan
The hash_index regression test fails because of this bug. We should
follow the overflow tuple chain while checking the visibility of the
- Tuple freezing during vacuum
This code is very recently added to PostgresQL Beta3. The HOT code
need some rework to deal with this change.
The following items have not yet been implemented, though the current
design does specify them:
- Overflow tuple header
In the current prototype tuples in the main heap also have the overflow
header, though this can be removed. At some of the places, we have used
this assumption and this needs to be fixed.
- Vacuum full disabled
Vacuum full is currently disabled on the HOT-updatable and overflow
relations. Though we did this for simplifying the early implementation, we
need to revisit this and see if we can remove this constraint. Since
vacuum full is disabled, we could reuse the HEAP_UPDATED flag for
marking copied-back tuples. So if we support vacuum full, then this
would need slight rework.
- Pagemode seqscan disabled
Again for simplifying the first version, pagemode seqscan is disabled
for HOT-updatable relations. Another reason for doing so is that root
tuples may get overwritten and we may need to reapply visibility checks
anyways. This might become a permanent feature.
- HOT-update for toasted tuples
We should be able to support HOT-updates for toasted tuples. But this
need more thinking.
- dump/restore may not work
Dump/restore may not work for HOT-updatable relation when it is
created by setting the "default_hot_update" GUC parameter to true. This
should be fixed if we decide to support the GUC parameter.
- HOT-updates for tables with expression index
For simplicity reasons, tables with expression index can not use HOT
updates. We need to think again and check if this can be relaxed.
- Alter table support
We may want to add ALTER TABLE support for HOT-update. So a table
which is not created with HOT-update ON can be altered to use HOT-update
and vice versa.
- Turning HOT-update ON/OFF
We may want to add support to turn HOT on/off while table is being
- Handling ctid references
As per discussion of the mailing list, there are postgresql clients
which might cache the ctid of a tuple and use it later. Since we replace
a dead root tuple with the oldtest live tuple in the tuple chain, there
are two copies of the live tuple. The working copy of the tuple is in
the main relation, but there could be backends still holding reference
to the overflow copy. If the working copy is updated/deleted, the
backends may not see that effect if the tuple is refetched using the
- Index creation
Assuming a relation is created with HOT-update on and the tuples are
HOT-updated, there could be tuples in the overflow chain with different
values for an attribute. If an index is now created on that attribute,
we must ensure that the semantics of having the same index value in all
the tuples in the overflow chain is maintained. That means, we should
either pull-up the chain so that there are no tuples in the chain or
break the chain and have multiple entries in the index.
pgsql-patches by date
|Next:||From: Markus Schiltknecht||Date: 2006-11-15 10:43:27|
|Subject: replication docs: split single vs. multi-master|
|Previous:||From: Andrew Dunstan||Date: 2006-11-13 17:16:16|
|Subject: Re: [PATCHES] WIP 2 interpreters for plperl|