Re: MVCC overheads

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pete Stevenson <etep(dot)nosnevets(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MVCC overheads
Date: 2016-07-08 18:57:57
Message-ID: CAHyXU0wc-s22-GAwBjn7L6zRDk_Q=D7_rbDALujB4JSEZJbY2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 7, 2016 at 11:45 AM, Pete Stevenson
<etep(dot)nosnevets(at)gmail(dot)com> wrote:
> Hi postgresql hackers -
>
> I would like to find some analysis (published work, blog posts) on the overheads affiliated with the guarantees provided by MVCC isolation. More specifically, assuming the current workload is CPU bound (as opposed to IO) what is the CPU overhead of generating the WAL, the overhead of version checking and version creation, and of garbage collecting old and unnecessary versions? For what it’s worth, I am working on a research project where it is envisioned that some of this work can be offloaded.

That's going to be hard to measure. First, what you didn't say is,
'with respect to what?'. You mention WAL for example. WAL is more of
a crash safety mechanism than anything and it's not really fair to
include it in an analysis of 'MVCC overhead', or at least not
completely. One thing that MVCC *does* objectively cause is bloat,
although you can still get bloat without MVCC if you (for example)
delete rows or rewrite rows such that they can't fit in their old
slot.

MVCC definitely incurs some runtime overhead to check visibility but
the amount of overhead is highly dependent on the specific workload.
Postgres 'hint bits' reduce the cost to near zero for many workloads
but in other workloads they are expensive to maintain and cause a lot
of extra traffic. One nice feature about not having to worry about
visibility is that you can read data directly out of the index. We
have some workarounds to deal with that ('all visible bit') but again
the amount of benefit from that strategy is going to be very situation
specific.

Stepping back, the overhead of MVCC in postgres (and probably other
systems too) has been continually reduced over the years -- the really
nasty parts have been relegated to background cleanup processing.
That processing is pretty sequential and the 'i/o bottleneck' is
finally getting solved on cheap storage pushing things back into the
cpu space.

In summary, I think the future of MVCC and transactional systems is
very bright, and the data management systems that discard
transactional safety in order to get some short term performance gains
is, uh, not so bright. Transactions are essential in systems where
data integrity matters.

merlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2016-07-08 19:02:02 Re: Showing parallel status in \df+
Previous Message Andres Freund 2016-07-08 18:52:21 Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <