Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update

From: Jim Beckstrom <jrbeckstrom(at)sbcglobal(dot)net>
To: Nicolai Tufar <ntufar(at)apb(dot)com(dot)tr>
Cc: pgsql-hackers(at)postgresql(dot)org, PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Date: 2002-11-27 14:43:01
Message-ID: 3DE4D9F5.2070203@sbcglobal.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Just for the humor of it, as well as to confirm Nick's perspective,
years ago on our inhouse developed Burroughs mainframe dbms, we had a
process called "garbage collect".

Nicolai Tufar wrote:

>I always wandered if VACUUM is the right name for the porcess. Now, when
>PostgreSQL
>is actively challenging in Enterprise space, it might be a good idea to give
>it a more
>enterprise-like name. Try to think how it is looking for an outside person
>to see
>us, database professionals hold lenghty discussions about the ways we
>vacuum a database. Why should you need to vacuum a database? Is it
>dirty? In my personal opinion, something like "space reclaiming daemon",
>"free-list organizer", "tuple recyle job" or "segment coalesce process"
>would
>sound more business-like .
>
>Regards,
>Nick
>
>
>----- Original Message -----
>From: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
>To: "Curtis Faith" <curtis(at)galtair(dot)com>
>Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>; "Ron Johnson" <ron(dot)l(dot)johnson(at)cox(dot)net>;
>"PgSQL Performance ML" <pgsql-performance(at)postgresql(dot)org>;
><pgsql-hackers(at)postgresql(dot)org>
>Sent: Tuesday, November 26, 2002 9:09 PM
>Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of
>insert/delete/update
>
>
>
>
>>Good ideas. I think the master solution is to hook the statistics
>>daemon information into an automatic vacuum that could _know_ which
>>tables need attention.
>>
>>--------------------------------------------------------------------------
>>
>>
>-
>
>
>>Curtis Faith wrote:
>>
>>
>>>tom lane wrote:
>>>
>>>
>>>>Sure, it's just shuffling the housekeeping work from one place to
>>>>another. The thing that I like about Postgres' approach is that we
>>>>put the housekeeping in a background task (VACUUM) rather than in the
>>>>critical path of foreground transaction commit.
>>>>
>>>>
>>>Thinking with my marketing hat on, MVCC would be a much bigger win if
>>>
>>>
>VACUUM
>
>
>>>was not required (or was done automagically). The need for periodic
>>>
>>>
>VACUUM
>
>
>>>just gives ammunition to the PostgreSQL opponents who can claim we are
>>>deferring work but that it amounts to the same thing.
>>>
>>>A fully automatic background VACUUM will significantly reduce but will
>>>
>>>
>not
>
>
>>>eliminate this perceived weakness.
>>>
>>>However, it always seemed to me there should be some way to reuse the
>>>
>>>
>space
>
>
>>>more dynamically and quickly than a background VACUUM thereby reducing
>>>
>>>
>the
>
>
>>>percentage of tuples that are expired in heavy update cases. If only a
>>>
>>>
>very
>
>
>>>tiny number of tuples on the disk are expired this will reduce the
>>>
>>>
>aggregate
>
>
>>>performance/space penalty of MVCC into insignificance for the majority
>>>
>>>
>of
>
>
>>>uses.
>>>
>>>Couldn't we reuse tuple and index space as soon as there are no
>>>
>>>
>transactions
>
>
>>>that depend on the old tuple or index values. I have imagined that this
>>>
>>>
>was
>
>
>>>always part of the long-term master plan.
>>>
>>>Couldn't we keep a list of dead tuples in shared memory and look in the
>>>
>>>
>list
>
>
>>>first when deciding where to place new values for inserts or updates so
>>>
>>>
>we
>
>
>>>don't have to rely on VACUUM (even a background one)? If there are
>>>
>>>
>expired
>
>
>>>tuple slots in the list these would be used before allocating a new slot
>>>
>>>
>from
>
>
>>>the tuple heap.
>>>
>>>The only issue is determining the lowest transaction ID for in-process
>>>transactions which seems relatively easy to do (if it's not already done
>>>somewhere).
>>>
>>>In the normal shutdown and startup case, a tuple VACUUM could be
>>>
>>>
>performed
>
>
>>>automatically. This would normally be very fast since there would not be
>>>
>>>
>many
>
>
>>>tuples in the list.
>>>
>>>Index slots would be handled differently since these cannot be
>>>
>>>
>substituted
>
>
>>>one for another. However, these could be recovered as part of every
>>>
>>>
>index
>
>
>>>page update. Pages would be scanned before being written and any expired
>>>slots that had transaction ID's lower than the lowest active slot would
>>>
>>>
>be
>
>
>>>removed. This could be done for non-leaf pages as well and would result
>>>
>>>
>in
>
>
>>>only reorganizing a page that is already going to be written thereby not
>>>adding much to the overall work.
>>>
>>>I don't think that internal pages that contain pointers to values in
>>>
>>>
>nodes
>
>
>>>further down the tree that are no longer in the leaf nodes because of
>>>
>>>
>this
>
>
>>>partial expired entry elimination will cause a problem since searches
>>>
>>>
>and
>
>
>>>scans will still work fine.
>>>
>>>Does VACUUM do something that could not be handled in this realtime
>>>
>>>
>manner?
>
>
>>>- Curtis
>>>
>>>
>>>
>>>---------------------------(end of broadcast)---------------------------
>>>TIP 4: Don't 'kill -9' the postmaster
>>>
>>>
>>>
>>--
>> Bruce Momjian | http://candle.pha.pa.us
>> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
>> + If your life is a hard drive, | 13 Roberts Road
>> + Christ can be your backup. | Newtown Square, Pennsylvania
>>
>>
>19073
>
>
>>---------------------------(end of broadcast)---------------------------
>>TIP 2: you can get off all lists at once with the unregister command
>> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>>
>>
>>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2002-11-27 15:09:59 Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Previous Message Merlin Moncure 2002-11-27 14:04:49 Re: PostGres and WIN32, a plea!

Browse pgsql-performance by date

  From Date Subject
Next Message Thrasher 2002-11-27 14:52:20 Child process procedures
Previous Message Nicolai Tufar 2002-11-27 14:02:22 Re: [HACKERS] Realtime VACUUM, was: performance of insert/delete/update