Re: MVCC performance issue

From: Thom Brown <thom(at)linux(dot)com>
To: Kyriacos Kyriacou <kyriacosk(at)prime-tel(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: MVCC performance issue
Date: 2010-11-12 13:54:57
Message-ID: AANLkTinhnXGo4SxdHN1b5twy2rguV+1An-XAeDtbnJFi@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 12 November 2010 13:47, Kyriacos Kyriacou <kyriacosk(at)prime-tel(dot)com>wrote:

> This is my first post in this mailing list and I would like to raise an
> issue that in my opinion is causing performance issues of PostgreSQL
> especially in a transaction processing environment. In my company we are
> using PostgreSQL for the last 8 year for our in-house developed billing
> system (telecom). The last few months we started considering moving to
> another RDBMS just because of this issue.
>
> After all these years, I believe that the biggest improvement that could
> be done and will boost overall performance especially for enterprise
> application will be to improve Multiversion Concurrency Control (MVCC)
> mechanism. In theory this seems to be improving performance for SELECT
> queries but on tables with very intensive and frequent updates, even
> that is not fully true because of the fragmentation of data caused by
> MVCC. I saw cases were a SELECT COUNT(*) on an empty (!!!) table (used
> as a buffer) took more than 40min to return a result! VACUUM is not a
> solution in my opinion even though after the introduction of autovacuum
> daemon situation got much better.
>
> PROBLEM DECRIPTION
> ------------------
> By definition of MVCC, when an UPDATE is performed, PostgreSQL creates a
> new copy of the row in a new location. Any SELECT queries within the
> same session are accessing the new version of the raw and all other
> queries from other users are still accessing the old version. When
> transaction is COMMIT PostgreSQL makes the a new version of the row as
> the "active" row and expires the old row that remains "dead" and then is
> up to VACUUM procedure to recover the "dead" rows space and make it
> available to the database engine. In case that transaction is ROLLBACK
> then space reserved for the new version of the row is released. The
> result is to have huge fragmentation on table space, unnecessary updates
> in all affected indexes, unnecessary costly I/O operations, poor
> performance on SELECT that retrieves big record sets (i.e. reports etc)
> and slower updates. As an example, consider updating the "live" balance
> of a customer for each phone call where the entire customer record has
> to be duplicated again and again upon each call just for modifying a
> numeric value!
>
> SUGGESTION
> --------------
> 1) When a raw UPDATE is performed, store all "new raw versions" either
> in separate temporary table space
> or in a reserved space at the end of each table (can be allocated
> dynamically) etc
> 2) Any SELECT queries within the same session will be again accessing
> the new version of the row
> 3) Any SELECT queries from other users will still be accessing the old
> version
> 4) When UPDATE transaction is ROLLBACK just release the space used in
> new temporary location
> 5) When UPDATE transaction is COMMIT then try to LOCK the old version
> and overwrite it at the same physical location (NO FRAGMENTATION).
> 6) Similar mechanism can be applied on INSERTS and DELETES
> 7) In case that transaction was COMMIT, the temporary location can be
> either released or archived/cleaned on a pre-scheduled basis. This will
> possibly allow the introduction of a TRANSACTION LOG backup mechanism as
> a next step.
> 8) After that VACUUM will have to deal only with deletions!!!
>
>
> I understand that my suggestion seems to be too simplified and also that
> there are many implementation details and difficulties that I am not
> aware.
>
> I strongly believe that the outcome of the discussion regarding this
> issue will be helpful.
>
>
Which version of PostgreSQL are you basing this on?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jon Nelson 2010-11-12 15:33:21 Re: postmaster consuming /lots/ of memory with hash aggregate. why?
Previous Message Kenneth Marshall 2010-11-12 13:52:35 Re: MVCC performance issue