Re: vacuum, performance, and MVCC

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-22 12:23:48
Message-ID: 87veqtbcq3.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After a long battle with technology, pgsql(at)mohawksoft(dot)com ("Mark Woodward"), an earthling, wrote:
>> Clinging to sanity, pgsql(at)mohawksoft(dot)com ("Mark Woodward") mumbled into
>> her beard:
>>> We all know that PostgreSQL suffers performance problems when rows are
>>> updated frequently prior to a vacuum. The most serious example can be
>>> seen
>>> by using PostgreSQL as a session handler for a busy we site. You may
>>> have
>>> thousands or millions of active sessions, each being updated per page
>>> hit.
>>>
>>> Each time the record is updated, a new version is created, thus
>>> lengthening the "correct" version search each time row is accessed,
>>> until,
>>> of course, the next vacuum comes along and corrects the index to point
>>> to
>>> the latest version of the record.
>>>
>>> Is that a fair explanation?
>>
>> No, it's not.
>>
>> 1. The index points to all the versions, until they get vacuumed out.
>
> It can't point to "all" versions, it points to the last "current" version
> as updated by vacuum, or the first version of the row.

No, it points to *all* the versions.

Suppose I take a table with two rows:

INFO: analyzing "public.test"
INFO: "test": 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Then, over and over, I remove and insert one entry with the same PK:

sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842550 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842551 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842552 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842553 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842554 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842555 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842556 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842557 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842558 1
sample=# delete from test where id = 2;insert into test (id) values (2);
DELETE 1
INSERT 4842559 1

Now, I vacuum it.

sample=# vacuum verbose analyze test;
INFO: vacuuming "public.test"
INFO: index "test_id_key" now contains 2 row versions in 2 pages
DETAIL: 10 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "test": removed 10 row versions in 1 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: "test": found 10 removable, 2 nonremovable row versions in 1 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "public.test"
INFO: "test": 1 pages, 2 rows sampled, 2 estimated total rows
VACUUM

Notice that the index contained 10 versions of that one row.

It pointed to *ALL* the versions.

>> 2. There may simultaneously be multiple "correct" versions. The
>> notion that there is one version that is The Correct One is wrong, and
>> you need to get rid of that thought.
>
> Sorry, this is misunderstanding. By "correct version search" it was
> implied "for this transaction." Later I mention finding the first row with
> a transaction lower than the current.

Ah. Then you need for each transaction to spawn an index for each
table that excludes non-current values.

>>> If my assertion is fundimentally true, then PostgreSQL will always
>>> suffer performance penalties under a heavy modification load. Of
>>> course, tables with many inserts are not an issue, it is mainly
>>> updates. The problem is that there are classes of problems where
>>> updates are the primary operation.
>>
>> The trouble with your assertion is that it is true for *all* database
>> systems except for those whose only transaction mode is READ
>> UNCOMMITTED, where the only row visible is the "Latest" version.
>
> Not true. Oracle does not seem to exhibit this problem.

Oracle suffers a problem in this regard that PostgreSQL doesn't; in
Oracle, rollbacks are quite expensive, as "recovery" requires doing
extra work that PostgreSQL doesn't do.
--
output = ("cbbrowne" "@" "gmail.com")
http://cbbrowne.com/info/
Marriage means commitment. Of course, so does insanity.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Roussel 2006-06-22 13:03:47 Re: [HACKERS] Sun Donated a Sun Fire T2000 to the PostgreSQL
Previous Message Diogo Biazus 2006-06-22 12:01:51 xlog viewer proposal