Re: Frequent Update Project: Design Overview of HOT Updates

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Frequent Update Project: Design Overview of HOT Updates
Date: 2006-11-10 12:45:21
Message-ID: 87ac2zpj1a.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:

> 1. It doubles the IO (original page + hot page), if the new row would
> have fit into the original page.

That's an awfully big IF there. Even if you use a fillfactor of 50% in which
case you're paying a 100% performance penalty *all* the time, not just when
dealing with a table that's been bloated by multiple versions you still have
no guarantee the extra versions will fit on the same page.

> 4. although at first it might seem so I see no advantage for vacuum with
> overflow

The main problem with vacuum now is that it must scan the entire table (and
the entire index) even if only a few records are garbage. If we isolate the
garbage in a separate area then vacuum doesn't have to scan unrelated tuples.

I'm not sure this really solves that problem because there are still DELETEs
to consider but it does remove one factor that exacerbates it unnecessarily.

I think the vision is that the overflow table would never be very large
because it can be vacuumed very aggressively. It has only tuples that are busy
and will need vacuuming as soon as a transaction ends. Unlike the main table
which is mostly tuples that don't need vacuuming.

> 5. the size reduction of heap is imho moot because you trade it for a
> growing overflow
> (size reduction only comes from reusing dead tuples and not
> adding index tuples --> SITC)

I think you're comparing the wrong thing. Size isn't a problem in itself, size
is a problem because it causes extra i/o. So a heap that's double in size
necessary takes twice as long as necessary to scan. The fact that the overflow
tables are taking up space isn't interesting if they don't have to be scanned.
Hitting the overflow tables should be quite rare, it only comes into play when
looking at concurrently updated tuples. It certainly happens but most tuples
in the table will be committed and not being concurrently updated by anyone
else.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2006-11-10 12:52:47 Re: Frequent Update Project: Design Overview of HOTUpdates
Previous Message Heikki Linnakangas 2006-11-10 12:38:11 Re: Frequent Update Project: Design Overview of HOTUpdates