Re: Frequent Update Project: Design Overview of HOTUpdates

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Frequent Update Project: Design Overview of HOTUpdates
Date: 2006-11-10 20:39:56
Message-ID: 1163191197.3634.918.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote:

> > I'm not sure this really solves that problem because there
> > are still DELETEs to consider but it does remove one factor
> > that exacerbates it unnecessarily.
>
> Yea, so you still need to vaccum the large table regularly.

HOT covers the use-case of heavy updating, which in many common cases
occurs on tables with few inserts/deletes. HOT would significantly
reduce the need to vacuum since deletes and wraparound issues would be
the only remaining reasons to do this.

[I have some ideas for how to optimize tables with heavy INSERT/DELETE
activity, but that case is much less prevalent than heavy UPDATEs.]

> > I think the vision is that the overflow table would never be
> > very large because it can be vacuumed very aggressively. It
> > has only tuples that are busy and will need vacuuming as soon
> > as a transaction ends. Unlike the main table which is mostly
> > tuples that don't need vacuuming.
>
> Ok, but you have to provide an extra vacuum that does only that then
> (and it randomly touches heap pages, and only does partial work there).

Sure, HOT needs a specially optimised VACUUM.

> > So a heap that's double in size necessary takes twice as
> > long as necessary to scan. The fact that the overflow tables
> > are taking up space isn't interesting if they don't have to
> > be scanned.
>
> The overflow does have to be read for each seq scan. And it was stated
> that it would
> be accessed with random access (follow tuple chain).
> But maybe we can read the overflow same as if it where an additional
> segment file ?

Not without taking a write-avoiding lock on the table, unfortunately.

> > Hitting the overflow tables should be quite rare, it only
> > comes into play when looking at concurrently updated tuples.
> > It certainly happens but most tuples in the table will be
> > committed and not being concurrently updated by anyone else.
>
> The first update moves the row to overflow, only the 2nd next might be
> able to pull it back.
> So on average you would have at least 66% of all updated rows after last
> vacuum in the overflow.
>
> The problem with needing very frequent vacuums is, that you might not be
> able to do any work because of long transactions.

HOT doesn't need more frequent VACUUMs, it is just more efficient and so
can allow them, when needed to avoid I/O. Space usage in the overflow
relation is at its worst in the case of an enormous table with low
volume random updates, but note that it is *never* worse than current
space usage. In the best case, which is actually fairly common in
practice: a small number of rows of a large table are being updated by a
steady stream of concurrent updates, we find the overflow relation needs
only a few 100 tuples, so regular vacuuming will be both easy and
effective.

As an aside, note that HOT works best in real-world situations, not
benchmarks such as TPC where the I/Os are deliberately randomised to
test the scalability of the RDBMS. But even then, HOT works better.

The long-running transaction issue remains unsolved in this proposal,
but I have some ideas for later.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-11-10 22:32:52 Re: [HACKERS] Bug in WAL backup documentation
Previous Message Simon Riggs 2006-11-10 20:39:43 Re: Frequent Update Project: Design Overview ofHOTUpdates