Re: [HACKERS] row reuse while UPDATE and vacuum analyze problem

From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Bernard Frankpitt <frankpit(at)pop(dot)dn(dot)net>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] row reuse while UPDATE and vacuum analyze problem
Date: 1999-07-28 16:11:03
Message-ID: 3.0.5.32.19990729021103.00b32c50@mail.rhyme.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 15:54 28/07/99 +0000, Bernard Frankpitt wrote:
>A partial solution to row-reuse is to allow writing commands (commands
>with write locks on a buffer) to purge expendable tuples as they find
>them. When a transaction starts it could look at the snapshots of
>concurrent transactions and compute a tide-mark equal to the oldest TID
>in all the concurrent snapshots. Any deleted tuple in a heap relation
>with a committed tmax value that is smaller than the tide-mark is
>invisible to any of the concurrent or future transactions, and is
>expendable. The write command that discovers the expendable transaction
>can then delete it, and reclaim the tuple space.

Is there any way that the space can be freed as soon as it is no longer
needed? I'm not sure how the MVCC stuff works, but I assume that when a R/O
TX starts, a lock is taken out on the tables (and/or rows) that are read.
If a R/W TX updates a record, then a new version is written with a new TID.
Can the writer also mark the old version of the row for deletion, so that
when the last reader commits, the row is deleted? I have no idea if this
would be more or less efficient that making the writers do it, but it would
have the advantage of distributing the load across more TXs.

>Unfortunately this solves only half the problem. Deleting the tuple
>means that other scans no longer have to read and then discard it, but
>the space that the tuple uses is not really reclaimed because the way
>that insertions work is that they always add to the end of the heap. If
>all the tuples in a
>table are of fixed size then one solution would be to keep a list of
>empty tuple slots in the heap, and insert new tuples in these slots
>first. This would allow inserts to keep the table well packed.

This makes sense, but I would guess that fixed length tuples would not be
common.

>In the case of tables with variable length tuples the problem seems
>harder. Since the actual tuples in a table are always accessed through
>ItemIds, it is possible for a process with a write lock on a buffer to
>rearrange the tuples in the page to remove free space without affecting
>concurrent processes' views of the data within the buffer. After
>freeing the tuple, and compacting the space in the page, the process
>would have to update the free space list
>by removing any previous pointer to space on the page, and then
>re-inserting a pointer to the new space on the page. The free space
>list becomes quite a bit more complicated in this case, as it has to
>keep track of the sizes of the free space segments, and it needs to be
>indexed by the block in which the free space resides, and the size of
>the space available. This would seem to indicate the need for both a
>tree-structure and a hash structure associated with the free space list.

I'm not sure I follow this, I also don't know anything about the internals
of the data storage code, but...

Using my favorite database as a model, an alternative might be to (1)
Record the free space on a page-by-page (or any kind of chunk-by-chunk)
basis, (2) Don't bother doing any rearrangement when a record is deleted,
just mark the record as free, and update the bytes-free count for the page,
(3) When you want to store a record, look for any page that has enough
bytes free, then do any space rearrangement necessary to store the record.

AFAICT, most of the above steps (with the exception of page-fullness) must
already be performed.

This will of course fail totally if records are allowed to freely cross
page boundaries.

Please forgive me if this is way off the mark...and (if you have the
patience) explain what I've missed.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.C.N. 008 659 498) | /(@) ______---_
Tel: +61-03-5367 7422 | _________ \
Fax: +61-03-5367 7430 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp5.ai.mit.edu:11371 |/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message D'Arcy J.M. Cain 1999-07-28 16:16:09 Re: [HACKERS] Checking if a system is ELF
Previous Message Bruce Momjian 1999-07-28 16:07:21 Re: Selectivity of "=" (Re: [HACKERS] Index not used on simple se lect)