Skip site navigation (1) Skip section navigation (2)

MVCC performance issue

From: "Kyriacos Kyriacou" <kyriacosk(at)prime-tel(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: MVCC performance issue
Date: 2010-11-12 13:47:30
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
This is my first post in this mailing list and I would like to raise an
issue that in my opinion is causing performance issues of PostgreSQL
especially in a transaction processing environment. In my company we are
using PostgreSQL for the last 8 year for our in-house developed billing
system (telecom). The last few months we started considering moving to
another RDBMS just because of this issue. 

After all these years, I believe that the biggest improvement that could
be done and will boost overall performance especially for enterprise
application will be to improve Multiversion Concurrency Control (MVCC)
mechanism. In theory this seems to be improving performance for SELECT
queries but on tables with very intensive and frequent updates, even
that is not fully true because of the fragmentation of data caused by
MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
as a buffer) took more than 40min to return a result! VACUUM is not a
solution in my opinion even though after the introduction of autovacuum
daemon situation got much better.

By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
new copy of the row in a new location. Any SELECT queries within the
same session are accessing the new version of the raw and all other
queries from other users are still accessing the old version. When
transaction is COMMIT PostgreSQL makes the a new version of the row as
the "active" row and expires the old row that remains "dead" and then is
up to VACUUM procedure to recover the "dead" rows space and make it
available to the database engine. In case that transaction is ROLLBACK
then space reserved for the new version of the row is released. The
result is to have huge fragmentation on table space, unnecessary updates
in all affected indexes, unnecessary costly I/O operations, poor
performance on SELECT that retrieves big record sets (i.e. reports etc)
and slower updates. As an example, consider updating the "live" balance
of a customer for each phone call where the entire customer record has
to be duplicated again and again upon each call just for modifying a
numeric value! 

1) When a raw UPDATE is performed, store all "new raw versions" either
in separate temporary table space 
   or in a reserved space at the end of each table (can be allocated
dynamically) etc 
2) Any SELECT queries within the same session will be again accessing
the new version of the row
3) Any SELECT queries from other users will still be accessing the old
4) When UPDATE transaction is ROLLBACK just release the space used in
new temporary location  
5) When UPDATE transaction is COMMIT then try to LOCK the old version
and overwrite it at the same physical location (NO FRAGMENTATION).
6) Similar mechanism can be applied on INSERTS and DELETES  
7) In case that transaction was COMMIT, the temporary location can be
either released or archived/cleaned on a pre-scheduled basis. This will
possibly allow the introduction of a TRANSACTION LOG backup mechanism as
a next step. 
8) After that VACUUM will have to deal only with deletions!!! 

I understand that my suggestion seems to be too simplified and also that
there are many implementation details and difficulties that I am not

I strongly believe that the outcome of the discussion regarding this
issue will be helpful. 

Best Regards, 

Kyriacos Kyriacou
Senior Developer/DBA


pgsql-performance by date

Next:From: Kenneth MarshallDate: 2010-11-12 13:52:35
Subject: Re: MVCC performance issue
Previous:From: C├ędric VillemainDate: 2010-11-12 12:40:52
Subject: Re: anti-join chosen even when slower than old plan

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group