Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

From: "Nicolai Tufar" <ntufar(at)apb(dot)com(dot)tr>
To: <pgsql-hackers(at)postgresql(dot)org>, "PgSQL Performance ML" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Date: 2002-11-27 14:02:22
Message-ID: 043701c2961d$9bbd7d20$8016a8c0@apb.com.tr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I always wandered if VACUUM is the right name for the porcess. Now, when
PostgreSQL
is actively challenging in Enterprise space, it might be a good idea to give
it a more
enterprise-like name. Try to think how it is looking for an outside person
to see
us, database professionals hold lenghty discussions about the ways we
vacuum a database. Why should you need to vacuum a database? Is it
dirty? In my personal opinion, something like "space reclaiming daemon",
"free-list organizer", "tuple recyle job" or "segment coalesce process"
would
sound more business-like .

Regards,
Nick

----- Original Message -----
From: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Curtis Faith" <curtis(at)galtair(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>; "Ron Johnson" <ron(dot)l(dot)johnson(at)cox(dot)net>;
"PgSQL Performance ML" <pgsql-performance(at)postgresql(dot)org>;
<pgsql-hackers(at)postgresql(dot)org>
Sent: Tuesday, November 26, 2002 9:09 PM
Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
insert/delete/update

>
> Good ideas. I think the master solution is to hook the statistics
> daemon information into an automatic vacuum that could _know_ which
> tables need attention.
>
> --------------------------------------------------------------------------
-
>
> Curtis Faith wrote:
> > tom lane wrote:
> > > Sure, it's just shuffling the housekeeping work from one place to
> > > another. The thing that I like about Postgres' approach is that we
> > > put the housekeeping in a background task (VACUUM) rather than in the
> > > critical path of foreground transaction commit.
> >
> > Thinking with my marketing hat on, MVCC would be a much bigger win if
VACUUM
> > was not required (or was done automagically). The need for periodic
VACUUM
> > just gives ammunition to the PostgreSQL opponents who can claim we are
> > deferring work but that it amounts to the same thing.
> >
> > A fully automatic background VACUUM will significantly reduce but will
not
> > eliminate this perceived weakness.
> >
> > However, it always seemed to me there should be some way to reuse the
space
> > more dynamically and quickly than a background VACUUM thereby reducing
the
> > percentage of tuples that are expired in heavy update cases. If only a
very
> > tiny number of tuples on the disk are expired this will reduce the
aggregate
> > performance/space penalty of MVCC into insignificance for the majority
of
> > uses.
> >
> > Couldn't we reuse tuple and index space as soon as there are no
transactions
> > that depend on the old tuple or index values. I have imagined that this
was
> > always part of the long-term master plan.
> >
> > Couldn't we keep a list of dead tuples in shared memory and look in the
list
> > first when deciding where to place new values for inserts or updates so
we
> > don't have to rely on VACUUM (even a background one)? If there are
expired
> > tuple slots in the list these would be used before allocating a new slot
from
> > the tuple heap.
> >
> > The only issue is determining the lowest transaction ID for in-process
> > transactions which seems relatively easy to do (if it's not already done
> > somewhere).
> >
> > In the normal shutdown and startup case, a tuple VACUUM could be
performed
> > automatically. This would normally be very fast since there would not be
many
> > tuples in the list.
> >
> > Index slots would be handled differently since these cannot be
substituted
> > one for another. However, these could be recovered as part of every
index
> > page update. Pages would be scanned before being written and any expired
> > slots that had transaction ID's lower than the lowest active slot would
be
> > removed. This could be done for non-leaf pages as well and would result
in
> > only reorganizing a page that is already going to be written thereby not
> > adding much to the overall work.
> >
> > I don't think that internal pages that contain pointers to values in
nodes
> > further down the tree that are no longer in the leaf nodes because of
this
> > partial expired entry elimination will cause a problem since searches
and
> > scans will still work fine.
> >
> > Does VACUUM do something that could not be handled in this realtime
manner?
> >
> > - Curtis
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania
19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2002-11-27 14:04:49 Re: PostGres and WIN32, a plea!
Previous Message Achilleus Mantzios 2002-11-27 12:23:12 FreeBSD, Linux: select, select count(*) performance

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Beckstrom 2002-11-27 14:43:01 Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Previous Message Tom Lane 2002-11-27 03:13:51 Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update