Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Curtis Faith <curtis(at)galtair(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>, PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Date: 2002-11-26 19:09:38
Message-ID: 200211261909.gAQJ9ce02496@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Good ideas. I think the master solution is to hook the statistics
daemon information into an automatic vacuum that could _know_ which
tables need attention.

---------------------------------------------------------------------------

Curtis Faith wrote:
> tom lane wrote:
> > Sure, it's just shuffling the housekeeping work from one place to
> > another. The thing that I like about Postgres' approach is that we
> > put the housekeeping in a background task (VACUUM) rather than in the
> > critical path of foreground transaction commit.
>
> Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM
> was not required (or was done automagically). The need for periodic VACUUM
> just gives ammunition to the PostgreSQL opponents who can claim we are
> deferring work but that it amounts to the same thing.
>
> A fully automatic background VACUUM will significantly reduce but will not
> eliminate this perceived weakness.
>
> However, it always seemed to me there should be some way to reuse the space
> more dynamically and quickly than a background VACUUM thereby reducing the
> percentage of tuples that are expired in heavy update cases. If only a very
> tiny number of tuples on the disk are expired this will reduce the aggregate
> performance/space penalty of MVCC into insignificance for the majority of
> uses.
>
> Couldn't we reuse tuple and index space as soon as there are no transactions
> that depend on the old tuple or index values. I have imagined that this was
> always part of the long-term master plan.
>
> Couldn't we keep a list of dead tuples in shared memory and look in the list
> first when deciding where to place new values for inserts or updates so we
> don't have to rely on VACUUM (even a background one)? If there are expired
> tuple slots in the list these would be used before allocating a new slot from
> the tuple heap.
>
> The only issue is determining the lowest transaction ID for in-process
> transactions which seems relatively easy to do (if it's not already done
> somewhere).
>
> In the normal shutdown and startup case, a tuple VACUUM could be performed
> automatically. This would normally be very fast since there would not be many
> tuples in the list.
>
> Index slots would be handled differently since these cannot be substituted
> one for another. However, these could be recovered as part of every index
> page update. Pages would be scanned before being written and any expired
> slots that had transaction ID's lower than the lowest active slot would be
> removed. This could be done for non-leaf pages as well and would result in
> only reorganizing a page that is already going to be written thereby not
> adding much to the overall work.
>
> I don't think that internal pages that contain pointers to values in nodes
> further down the tree that are no longer in the leaf nodes because of this
> partial expired entry elimination will cause a problem since searches and
> scans will still work fine.
>
> Does VACUUM do something that could not be handled in this realtime manner?
>
> - Curtis
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2002-11-26 19:25:46 Re: performance of insert/delete/update
Previous Message Bruce Momjian 2002-11-26 18:58:46 Re: Antw: Re: Native Win32 sources

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Treat 2002-11-26 19:25:46 Re: performance of insert/delete/update
Previous Message scott.marlowe 2002-11-26 18:46:27 Re: performance of insert/delete/update