Re: vacuum, performance, and MVCC

From: "Mark Woodward" <pgsql(at)mohawksoft(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Christopher Browne" <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-22 17:49:36
Message-ID: 3936.216.145.49.15.1150998576.squirrel@mail.mohawksoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Christopher Browne <cbbrowne(at)acm(dot)org> writes:
>> After a long battle with technology, pgsql(at)mohawksoft(dot)com ("Mark
>> Woodward"), an earthling, wrote:
>>> Not true. Oracle does not seem to exhibit this problem.
>
>> Oracle suffers a problem in this regard that PostgreSQL doesn't; in
>> Oracle, rollbacks are quite expensive, as "recovery" requires doing
>> extra work that PostgreSQL doesn't do.
>
> The Oracle design has got other drawbacks: if you need to access a row
> version other than than the very latest, you need to go searching in the
> rollback segments for it. This is slow (no index help) and creates
> significant amounts of contention (since lots of processes are competing
> to touch the rollback segments).

But, it is all probability, in most cases, the VAST majority, older
versions aren't much needed outside the concurency of of active
transactions.

> Plus there's the old bugaboo that
> long-running transactions require indefinite amounts of rollback space,
> and Oracle is apparently unable to enlarge that space on-the-fly.
> (This last seems like a surmountable problem, but maybe there is some
> non-obvious reason why it's hard.)

Yea, Oracle has a million way to die. And when you think you know all one
million, you find one million and one.

>
> Basically there's no free lunch: if you want the benefits of MVCC it's
> going to cost you somewhere. In the Postgres design you pay by having
> to do VACUUM pretty often for heavily-updated tables. I don't think
> that decision is fundamentally wrong --- the attractive thing about it
> is that the overhead is pushed out of the foreground query-processing
> code paths.

Under certain circumstances, it is a very poor design. Think of a single
row table that keeps a scoreboard or a session table that keeps a limited
number of rows that are updated very frequently.

> We still have lots of work to do in making autovacuum
> smarter, avoiding vacuuming parts of relations that have not changed,
> and so on. But I have no desire to go over to an Oracle-style solution
> instead. We can't beat them by trying to be like them, and we run no
> small risk of falling foul of some of their patents if we do.

I proposed having a "key row entry" for each logical row. The key row
entry points to the latest version of the row. There, each row entry is a
linked list, in descending order, of previous row versions. The vast
majority of the time, the latest version will be the first version. It is
only when you have a previously started long running or concurrent
transaction will you ever look at previous versions.

I'm not saying it is an easy slam dunk, as I can think of a few
difficulties off the top of my head, but it would solve the steady
degradation of performance between vacuums and, to a possibly lesser
extent, the cost of updating a row in a heavily indexed table.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2006-06-22 17:50:18 Re: let's meet
Previous Message Tom Lane 2006-06-22 17:48:13 Re: Going for "all green" buildfarm results