Re: 'TID index'

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 'TID index'
Date: 2004-09-25 10:14:53
Message-ID: NOEFLCFHBPDAFHEIPGBOKEBFCFAA.simon@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>Jim C. Nasby
> On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote:
> > There are many good ideas out there, yet it is almost impossible to find
> > somebody else to implement yours!
> >
> > The acid test is to try and write it...
> >
> > Overall, I agree VACUUM could do with some tuning - and 8.0 has
> just that.
> > It needs very careful thought to make sure both concurrency and
> > recoverability considerations are fully met in any solution you come up
> > with.
>
> Before I try and tackle this I wanted to investigate why PostgreSQL's
> MVCC depends on vacuum instead of going with an undo log ala Oracle or
> InnoDB. The only thing I could find is this thread
> (http://tinyurl.com/42opl) which doesn't say all that much.
>
> Has anyone looked at the performance impact of having to vacuum vs a
> method that doesn't recquire vacuuming? Obviously rollback with an undo
> log is slow, but it seems it's better to optimize for the normal
> situation of committing.
>
> I'm not looking for a holy war here but I'd like to know if any analysis
> has happened to determine how efficient vacuum can be made and if it's
> as efficient as just deleting an undo log once you no longer need it.

Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole
of its history. There is much written on this and you should search some
more - references are in the manual.
http://citeseer.ist.psu.edu/cache/papers/cs/4130/http:zSzzSzwuarchive.wustl.
eduzSzpackageszSzpostgreszSzpaperszSzERL-M87-06.pdf/stonebraker87design.pdf

Put simply, MVCC requires vacuum - they are two halves of the overall
mechanism. An Undo log would be an alternative to MVCC and therefore offer
an alternative to vacuum. But there are other things to consider - no other
RDBMS requires a "vacuum", but all of them have SOME requirement for
off-line scanning of the database, whether its ANALYZE, or REORG, or dbcc
(or more than of those...). Removing vacuum won't remove the other
requirements, so there is less to be gained than you might think.

IMHO vacuum could be optimised further in terms of what it does and how it
does it, but many of those optimizations wouldn't apply generally - so you'd
then either be introducing further tweaking requirements for DBAs (which are
frowned upon) or giving yourself the additional task of auto-optimizing
their deployment.

In PostgreSQL 8.0, vacuum has been enhanced so that it operates in the
background, avoids locks, doesn't spoil the dbpage cache - on top of being
automated. That approach is simple and pragmatic and that's hard to beat.

I think the effort of altering the current storage system is not something
you'd have people invest their time in (round here) and would probably
regard even the debate as wasted effort (and remember that is a most closely
guarded resource on the project). PG does allow you to write your own
storage manager...

If you're keen to code, there are many TODO items that are "pre-agreed" as
being required (from earlier discussions). That's probably the best place to
start. Having said that, research into why the code is the way it is can be
interesting and very educational - overall, the code really is written
fairly optimally from a robustness and performance perspective - you need to
be a fair way in to spot opps for improvement. Don't be put off - get stuck
in.

Best Regards, Simon Riggs

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2004-09-25 11:50:03 Re: BUG: possible busy loop when connection is closed
Previous Message Mahmoud Taghizadeh 2004-09-25 10:04:25 Re: How to add locale support for each column?