Re: Scalability in postgres

From: "simon(at)2ndquadrant(dot)com" <simon(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Scalability in postgres
Date: 2009-08-16 09:10:28
Message-ID: 1721003159.17251.1250413828132.JavaMail.open-xchange@oxltgw16.schlund.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 14 August 2009 at 03:18 Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> This my rough understanding.  Oracle never
> "takes" a snapshot, it computes one the fly, if and when it is needed.  It
> maintains a
> structure of recently committed transactions, with the XID for when they
> committed.  If a
> process runs into a tuple that is neither from the future nor from the deep
> past, it
> consults this structure to see if that transaction has committed, and if so
> whether it did so before or
> after the current query was started.  The structure is partionable so
> it does not have one global lock to serialize on, and the lock is short as it
> only gets
> the info it needs, not the entire set of global open transactions.

If this is the way Oracle does it then the data structure you describe would
need to be updated on both transaction start and transaction commit, as well as
being locked while it was read. Transaction commits would need to be serialized
so that the commit order was maintained. 

The Oracle structure would be read much less often, yet updated twice as often
at snapshot point and at commit. It could be partitionable, but that would
increase the conflict associated with reads of the data structure.

Oracle's structure works well for an "ideal workload" such as TPC-C where the
data is physically separated and so the reads on this structure are almost nil.
It would work very badly on data that continuously conflicted, which may account
for the fact that no Oracle benchmark has been published on TPC-E. This bears
out the experience of many Oracle DBAs, including myself. I certainly wouldn't
assume Oracle have solved every problem.

The Postgres procarray structure is read often, yet only exclusively locked
during commit. As Tom said, we optimize away the lock at xid assignment and also
optimize away many xid assignments altogether. We don't have any evidence that
the size of the procarray reduces the speed of reads, but we do know that the
increased queue length you get from having many concurrent sessions increases
time to record commit.

We might be able to do something similar to Oracle with Postgres, but it would
require significant changes and much complex thought. The reason for doing so
would be to reduce the number of reads on the "MVCC structure", making mild
partitioning more palatable. The good thing about the current Postgres structure
is that it doesn't increase contention when accessing concurrently updated data.

On balance it would appear that Oracle gains a benchmark win by giving up some
real world usefulness. That's never been something anybody here has been willing
to trade. 

Further thought in this area could prove useful, but it seems a lower priority
for development simply because of the code complexity required to make this sort
of change.

Best Regards, Simon Riggs

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Peter Eisentraut 2009-08-16 12:14:16 Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )
Previous Message Jeff Davis 2009-08-15 23:55:41 Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )