Re: vacuum, performance, and MVCC

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-22 01:36:39
Message-ID: 871wtic6oo.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Clinging to sanity, pgsql(at)mohawksoft(dot)com ("Mark Woodward") mumbled into her beard:
> We all know that PostgreSQL suffers performance problems when rows are
> updated frequently prior to a vacuum. The most serious example can be seen
> by using PostgreSQL as a session handler for a busy we site. You may have
> thousands or millions of active sessions, each being updated per page hit.
>
> Each time the record is updated, a new version is created, thus
> lengthening the "correct" version search each time row is accessed, until,
> of course, the next vacuum comes along and corrects the index to point to
> the latest version of the record.
>
> Is that a fair explanation?

No, it's not.

1. The index points to all the versions, until they get vacuumed out.

2. There may simultaneously be multiple "correct" versions. The
notion that there is one version that is The Correct One is wrong, and
you need to get rid of that thought.

> If my assertion is fundimentally true, then PostgreSQL will always suffer
> performance penalties under a heavy modification load. Of course, tables
> with many inserts are not an issue, it is mainly updates. The problem is
> that there are classes of problems where updates are the primary
> operation.

The trouble with your assertion is that it is true for *all* database
systems except for those whose only transaction mode is READ
UNCOMMITTED, where the only row visible is the "Latest" version.

> I was thinking, just as a hypothetical, what if we reversed the
> problem, and always referenced the newest version of a row and
> scanned backwards across the versions to the first that has a lower
> transacton number?

That would require an index on transaction number, which is an
additional data structure not in place now. That would presumably
worsen things.

> One possible implementation: PostgreSQL could keep an indirection array of
> index to table ref for use by all the indexes on a table. The various
> indexes return offsets into the array, not direct table refs. Because the
> table refs are separate from the index, they can be updated each time a
> transaction is commited.

You mean, this index would be "VACUUMed" as a part of each transaction
COMMIT? I can't see that turning out well...

> This way, the newest version of a row is always the first row
> found. Also, on a heavily updated site, the most used rows would
> always be at the end of the table, reducing amount of disk reads or
> cache memory required to find the correct row version for each
> query.

I can't see how it follows that most-used rows would migrate to the
end of the table. That would only be true in a database that is never
VACUUMed; as soon as a VACUUM is done, free space opens up in the
interior, so that new tuples may be placed in the "interior."
--
If this was helpful, <http://svcs.affero.net/rm.php?r=cbbrowne> rate me
http://linuxdatabases.info/info/lisp.html
"On a normal ascii line, the only safe condition to detect is a
'BREAK' - everything else having been assigned functions by Gnu
EMACS." -- Tarl Neustaedter

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-06-22 02:13:03 Re: Overhead for stats_command_string et al, take 2
Previous Message Andrew Dunstan 2006-06-21 21:24:42 Re: CVS HEAD busted on Windows?