Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering)

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, marty(at)outputservices(dot)com, herve(at)elma(dot)fr, pgsql-performance(at)postgresql(dot)org
Subject: Faster and more frequent VACUUM (was PostgreSQL clustering VS MySQL clustering)
Date: 2005-01-24 08:41:37
Message-ID: 1106556097.31592.139.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, 2005-01-23 at 15:40 -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > Changing the idea slightly might be better: if a row update would cause
> > a block split, then if there is more than one row version then we vacuum
> > the whole block first, then re-attempt the update.
>
> "Block split"? I think you are confusing tables with indexes.

Terminologically loose, as ever. :(
I meant both tables and indexes and was referring to the part of the
algorithm that is entered when we have a block-full situation.

> Chasing down prior versions of the same row is not very practical
> anyway, since there is no direct way to find them.
>
> One possibility is, if you tried to insert a row on a given page but
> there's not room, to look through the other rows on the same page to see
> if any are deletable (xmax below the GlobalXmin event horizon). This
> strikes me as a fairly expensive operation though, especially when you
> take into account the need to get rid of their index entries first.

Thats what I was suggesting, vac the whole page, not just those rows.

Doing it immediately greatly increases the chance that the index blocks
would be in cache also.

> Moreover, the check would often be unproductive.
> The real issue with any such scheme is that you are putting maintenance
> costs into the critical paths of foreground processes that are executing
> user queries. I think that one of the primary advantages of the
> Postgres storage design is that we keep that work outside the critical
> path and delegate it to maintenance processes that can run in the
> background. We shouldn't lightly toss away that advantage.

Completely agree. ...which is why I was trying to find a place for such
an operation in-front-of another expensive operation which is also
currently on the critical path. That way there might be benefit rather
than just additional overhead.

> There was some discussion in Toronto this week about storing bitmaps
> that would tell VACUUM whether or not there was any need to visit
> individual pages of each table. Getting rid of useless scans through
> not-recently-changed areas of large tables would make for a significant
> reduction in the cost of VACUUM.

ISTM there are two issues here, which are only somewhat related:
- speed of VACUUM on large tables
- ability to run VACUUM very frequently on very frequently updated
tables

The needs-maintenance bitmap idea hits both, whilst the on-the-spot idea
only hits the second one, even if it does it +/- better. Gut feel says
we would implement only one idea...so...

On balance that indicates the need-maintenance bitmap is a better idea,
and one for which we already have existing code.
A few questions...
- wouldn't we need a bitmap per relation?
- wouldn't all the extra bitmaps need to be cached in shared_buffers,
which could use up a good proportion of buffer cache space
- maybe we should use a smaller block size and a different cache for it
- how would we update the bitmap without creating a new LWlock that
needs to be acquired for every block write and so reducing scalability?
- would this be implemented as an option for each table, so that we
could avoid the implementation overhead? (Or perhaps don't have a bitmap
if table is less than 16 blocks?)

--
Best Regards, Simon Riggs

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Antony Paul 2005-01-24 09:18:10 How to boost performance of ilike queries ?
Previous Message SpaceBallOne 2005-01-24 07:56:41 poor performance of db?