Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update

From: Tommi Maekitalo <t(dot)maekitalo(at)epgmbh(dot)de>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update
Date: 2002-11-27 15:34:04
Message-ID: 200211271634.04404.t.maekitalo@epgmbh.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Or just reorg.

Am Mittwoch, 27. November 2002 15:02 schrieb Nicolai Tufar:
> I always wandered if VACUUM is the right name for the porcess. Now, when
> PostgreSQL
> is actively challenging in Enterprise space, it might be a good idea to
> give it a more
> enterprise-like name. Try to think how it is looking for an outside person
> to see
> us, database professionals hold lenghty discussions about the ways we
> vacuum a database. Why should you need to vacuum a database? Is it
> dirty? In my personal opinion, something like "space reclaiming daemon",
> "free-list organizer", "tuple recyle job" or "segment coalesce process"
> would
> sound more business-like .
>
> Regards,
> Nick
>
>
> ----- Original Message -----
> From: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
> To: "Curtis Faith" <curtis(at)galtair(dot)com>
> Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>; "Ron Johnson" <ron(dot)l(dot)johnson(at)cox(dot)net>;
> "PgSQL Performance ML" <pgsql-performance(at)postgresql(dot)org>;
> <pgsql-hackers(at)postgresql(dot)org>
> Sent: Tuesday, November 26, 2002 9:09 PM
> Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
> insert/delete/update
>
> > Good ideas. I think the master solution is to hook the statistics
> > daemon information into an automatic vacuum that could _know_ which
> > tables need attention.
> >
> > -------------------------------------------------------------------------
> >-
>
> -
>
> > Curtis Faith wrote:
> > > tom lane wrote:
> > > > Sure, it's just shuffling the housekeeping work from one place to
> > > > another. The thing that I like about Postgres' approach is that we
> > > > put the housekeeping in a background task (VACUUM) rather than in the
> > > > critical path of foreground transaction commit.
> > >
> > > Thinking with my marketing hat on, MVCC would be a much bigger win if
>
> VACUUM
>
> > > was not required (or was done automagically). The need for periodic
>
> VACUUM
>
> > > just gives ammunition to the PostgreSQL opponents who can claim we are
> > > deferring work but that it amounts to the same thing.
> > >
> > > A fully automatic background VACUUM will significantly reduce but will
>
> not
>
> > > eliminate this perceived weakness.
> > >
> > > However, it always seemed to me there should be some way to reuse the
>
> space
>
> > > more dynamically and quickly than a background VACUUM thereby reducing
>
> the
>
> > > percentage of tuples that are expired in heavy update cases. If only a
>
> very
>
> > > tiny number of tuples on the disk are expired this will reduce the
>
> aggregate
>
> > > performance/space penalty of MVCC into insignificance for the majority
>
> of
>
> > > uses.
> > >
> > > Couldn't we reuse tuple and index space as soon as there are no
>
> transactions
>
> > > that depend on the old tuple or index values. I have imagined that this
>
> was
>
> > > always part of the long-term master plan.
> > >
> > > Couldn't we keep a list of dead tuples in shared memory and look in the
>
> list
>
> > > first when deciding where to place new values for inserts or updates so
>
> we
>
> > > don't have to rely on VACUUM (even a background one)? If there are
>
> expired
>
> > > tuple slots in the list these would be used before allocating a new
> > > slot
>
> from
>
> > > the tuple heap.
> > >
> > > The only issue is determining the lowest transaction ID for in-process
> > > transactions which seems relatively easy to do (if it's not already
> > > done somewhere).
> > >
> > > In the normal shutdown and startup case, a tuple VACUUM could be
>
> performed
>
> > > automatically. This would normally be very fast since there would not
> > > be
>
> many
>
> > > tuples in the list.
> > >
> > > Index slots would be handled differently since these cannot be
>
> substituted
>
> > > one for another. However, these could be recovered as part of every
>
> index
>
> > > page update. Pages would be scanned before being written and any
> > > expired slots that had transaction ID's lower than the lowest active
> > > slot would
>
> be
>
> > > removed. This could be done for non-leaf pages as well and would result
>
> in
>
> > > only reorganizing a page that is already going to be written thereby
> > > not adding much to the overall work.
> > >
> > > I don't think that internal pages that contain pointers to values in
>
> nodes
>
> > > further down the tree that are no longer in the leaf nodes because of
>
> this
>
> > > partial expired entry elimination will cause a problem since searches
>
> and
>
> > > scans will still work fine.
> > >
> > > Does VACUUM do something that could not be handled in this realtime
>
> manner?
>
> > > - Curtis
> > >
> > >
> > >
> > > ---------------------------(end of
> > > broadcast)--------------------------- TIP 4: Don't 'kill -9' the
> > > postmaster
> >
> > --
> > Bruce Momjian | http://candle.pha.pa.us
> > pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> > + If your life is a hard drive, | 13 Roberts Road
> > + Christ can be your backup. | Newtown Square, Pennsylvania
>
> 19073
>
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> > (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Dr. Eckhardt + Partner GmbH
http://www.epgmbh.de

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Wheeler 2002-11-27 15:35:59 Re: [HACKERS] Interface update for 7.3
Previous Message David Wheeler 2002-11-27 15:33:33 Re: Interface update for 7.3

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2002-11-27 16:26:30 Re: [PERFORM] Realtime VACUUM, was: performance of insert/delete/update
Previous Message Dave Page 2002-11-27 15:09:59 Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update