Re: UNDO and in-place update

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UNDO and in-place update
Date: 2016-11-24 23:03:01
Message-ID: CA+TgmoaS-7w1oJBBCffEWyFCAFS3q_4Qysmm_S-qc-GPL9W3Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 24, 2016 at 11:06 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> Fwiw, Oracle does not use the undo log for snapshot fetches. It's used
> only for transaction rollback and recovery.
>
> For snapshot isolation Oracle has yet a *third* copy of the data in a
> space called the "rollback segment(s)". When you update a row in a
> block you save the whole block in the rollback segment. When you try
> to access a block you check if the CSN -- which is basically
> equivalent to our LSN -- is newer than your snapshot and if it is you
> fetch the old version of the block from the rollback.

My understanding is that this isn't correct. I think the rollback
segments are what they call the thing that stores UNDO. See e.g.
http://ss64.com/ora/syntax-redo.html

Having said that, I'm by no means an Oracle expert.

> Essentially their MVCC is done on a per-block level rather than a
> per-row level and they keep only the newest version of the block in
> the table, the rest are in the rollback segment. For what it's worth
> I think our approach is cleaner and more flexible. They had a lot of
> trouble with their approach over the years and it works well only
> because they invested an enormous amount of development in it and also
> because people throw a lot of hardware at it too.

People do throw a lot of hardware at Oracle, but I don't think I'd
accept the contention that Oracle generally gets less performance out
of the same amount of hardware than PostgreSQL. There are certainly
some things we often do faster -- including inserts, deletes, and
transaction aborts -- but their implementation of updates seems to be
very fast. Obviously, a lot depends on configuration and workload, so
it's hard to make general statements, but I think it's more correct to
imagine that people throw a lot of hardware at Oracle because that's
where their big, critical databases are than to imagine that it's
because Oracle is a resource hog.

> I think the main use case we have trouble with is actually the "update
> every row in the table" type of update which requires we write to
> every block, plus a second copy of every block, plus write full pages
> of both copies, then later set hint bits dirtying pages again and
> generating more full pages writes, then later come along and vacuum
> which requires two more writes of every block, etc. If we had a
> solution for the special case of an update that replaces every row in
> a page that I think would complement HOT nicely and go a long way
> towards fixing our issues.

This case is certainly a problem, but I don't believe it's anywhere
close to being our only problem. My experience is that our system
works fairly well when there are only a few indexes. However, with
heavily indexed tables, all of your updates become non-HOT and then
bad things happen. So you can either leave out indexes that you need
for good query performance and then you're hosed, or you can add those
indexes and then you're hosed anyway because of the bloat and write
amplification associated with non-HOT updates.

Also, write-heavy workloads that perform OK as long as there are no
long-lived snapshots often enter a death spiral when you run reporting
queries on the same system. Finer-grained snapshot tracking would
help with that problem, but it doesn't solve it: now a single
long-lived snapshot can "only" cause the database to double in size
instead of increasing without bound, a scant consolation. Nor does it
change the fundamental calculus that once a table gets bloated, it's
very painful to de-bloat it. The appeal of a design that supports
in-place update is that you don't bloat the table in the first place.
You still have the bloat, of course, but it's off in a separate data
structure that is engineered for efficient deletion.

I think that the whole emphasis on whether and to what degree this is
like Oracle is somewhat misplaced. I would look at it a different
way. We've talked many times over the years about how PostgreSQL is
optimized for aborts. Everybody that I've heard comment on that issue
thinks that is a bad thing. I am proposing a design that is optimized
for commits; that is, if the transaction commits, none of the pages it
modified need to be dirtied again afterwards at any point. I think
that's an extremely important property and it's one that we are very
far from having today. It necessarily implies that you cannot store
the old row versions in the heap, because if you do, then you are
going to have to dirty the pages again to get rid of them (unless you
prefer to just leak the space forever). Now there is plenty of room
for argument about whether the specific design I proposed is going to
be any good, and I think that would be quite an interesting discussion
to have. But I think if you say, well, you know, the fact that we may
rewrite the same page 5 or 6 times after a modification to set hint
bits (a few at a time) and HOT prune and set all-visible and freeze
isn't really any big deal, you must live in a world where the write
bandwidth of the I/O channel is a lot less of a problem than it is in
mine. And we've been around and around on all of that stuff and
people have come up with various ideas to improve the situation - some
of which have been implemented - but many of those ideas involve
unpleasant trade-offs and so the core problems remain. If we don't do
something more fundamental, we're still going to be up against the
same basic issues ten years from now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-24 23:13:22 Re: Functions Immutable but not parallel safe?
Previous Message Alvaro Herrera 2016-11-24 22:36:30 Re: pg_dump / copy bugs with "big lines" ?