Re: MVCC performance issue

From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Kyriacos Kyriacou <kyriacosk(at)prime-tel(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: MVCC performance issue
Date: 2010-11-12 13:52:35
Message-ID: 20101112135235.GI14016@aart.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Nov 12, 2010 at 03:47:30PM +0200, Kyriacos Kyriacou wrote:
> This is my first post in this mailing list and I would like to raise an
> issue that in my opinion is causing performance issues of PostgreSQL
> especially in a transaction processing environment. In my company we are
> using PostgreSQL for the last 8 year for our in-house developed billing
> system (telecom). The last few months we started considering moving to
> another RDBMS just because of this issue.
>
> After all these years, I believe that the biggest improvement that could
> be done and will boost overall performance especially for enterprise
> application will be to improve Multiversion Concurrency Control (MVCC)
> mechanism. In theory this seems to be improving performance for SELECT
> queries but on tables with very intensive and frequent updates, even
> that is not fully true because of the fragmentation of data caused by
> MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
> as a buffer) took more than 40min to return a result! VACUUM is not a
> solution in my opinion even though after the introduction of autovacuum
> daemon situation got much better.
>
> PROBLEM DECRIPTION
> ------------------
> By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
> new copy of the row in a new location. Any SELECT queries within the
> same session are accessing the new version of the raw and all other
> queries from other users are still accessing the old version. When
> transaction is COMMIT PostgreSQL makes the a new version of the row as
> the "active" row and expires the old row that remains "dead" and then is
> up to VACUUM procedure to recover the "dead" rows space and make it
> available to the database engine. In case that transaction is ROLLBACK
> then space reserved for the new version of the row is released. The
> result is to have huge fragmentation on table space, unnecessary updates
> in all affected indexes, unnecessary costly I/O operations, poor
> performance on SELECT that retrieves big record sets (i.e. reports etc)
> and slower updates. As an example, consider updating the "live" balance
> of a customer for each phone call where the entire customer record has
> to be duplicated again and again upon each call just for modifying a
> numeric value!
>
> SUGGESTION
> --------------
> 1) When a raw UPDATE is performed, store all "new raw versions" either
> in separate temporary table space
> or in a reserved space at the end of each table (can be allocated
> dynamically) etc
> 2) Any SELECT queries within the same session will be again accessing
> the new version of the row
> 3) Any SELECT queries from other users will still be accessing the old
> version
> 4) When UPDATE transaction is ROLLBACK just release the space used in
> new temporary location
> 5) When UPDATE transaction is COMMIT then try to LOCK the old version
> and overwrite it at the same physical location (NO FRAGMENTATION).
> 6) Similar mechanism can be applied on INSERTS and DELETES
> 7) In case that transaction was COMMIT, the temporary location can be
> either released or archived/cleaned on a pre-scheduled basis. This will
> possibly allow the introduction of a TRANSACTION LOG backup mechanism as
> a next step.
> 8) After that VACUUM will have to deal only with deletions!!!
>
>
> I understand that my suggestion seems to be too simplified and also that
> there are many implementation details and difficulties that I am not
> aware.
>
> I strongly believe that the outcome of the discussion regarding this
> issue will be helpful.
>
> Best Regards,
>
> Kyriacos Kyriacou
> Senior Developer/DBA
>

I cannot speak to your suggestion, but it sounds like you are not
vacuuming enough and a lot of the bloat/randomization would be helped
by making use of HOT updates in which the updates are all in the same
page and are reclaimed almost immediately.

Regards,
Ken

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Thom Brown 2010-11-12 13:54:57 Re: MVCC performance issue
Previous Message Kyriacos Kyriacou 2010-11-12 13:47:30 MVCC performance issue