Re: Piggybacking vacuum I/O

From: "Simon Riggs" <simon(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Piggybacking vacuum I/O
Date: 2007-01-22 18:38:59
Message-ID: 1169491139.3776.394.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-01-22 at 13:41 +0000, Heikki Linnakangas wrote:
> Any thoughts before I start experimenting?

Probably only to detail the various use cases we are discussing.

My thoughts on various use cases are:

- small table with frequent update/delete, heap and indexes all/mostly
cached
e.g. Counter tables, DBT2: District/Warehouse TPC-C, pgbench:
Branches/Tellers
Current VACUUM works well in this situation, since the only I/O incurred
is the WAL written for the VACUUM. VACUUM very cheap even if not in
cache because of sequential I/O. Keeping track of whether there are hot
spots in these tables seems like a waste of cycles and could potentially
introduce contention and hence reduce performance. These need to be very
frequently VACUUMed, even when other VACUUMs are required.
My current view: just need multiple concurrent autovacuum processes.

- large table with severe hotspots
e.g. DBT2: NewOrder, larger queue-style tables
The hotspots are likely to be in cache and the not-so-hotspots might or
might not be in cache, but we don't care either way. DSM concept works
well for this case, since we are able to avoid lots of I/O by
appropriate book-keeping. Works well for removing rows after a file-scan
DELETE, as well as for DELETE or UPDATE hot spots.
My current view: DSM would be great for this

- large table with few hotspots
e.g. DBT2: Stock, pgbench: Accounts, most Customer tables
Current VACUUM works very badly in this case, since updates are sparsely
distributed across table. DSM wouldn't help either unless we
differentiate between few/many updates to a block.
My current view: Piggyback concept seems on the right track, but clearly
needs further thought.

Currently we have only one technique for garbage collection, plus one
process to perform it. We need multiple techniques executed by multiple
processes, when required, plus some way of automatically selecting which
is appropriate depending upon the use case. Yes, automatic :-)

DSM and this piggyback idea need not be thought of as competing
techniques.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-01-22 18:41:22 Re: [HACKERS] Autovacuum Improvements
Previous Message Dawid Kuroczko 2007-01-22 18:37:52 Re: savepoint improvements