| From: | "Simon Riggs" <simon(at)2ndquadrant(dot)com> | 
|---|---|
| To: | "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> | 
| Cc: | "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Frequent Update Project: Design Overview of HOTUpdates | 
| Date: | 2006-11-10 20:39:56 | 
| Message-ID: | 1163191197.3634.918.camel@silverbirch.site | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote:
> > I'm not sure this really solves that problem because there 
> > are still DELETEs to consider but it does remove one factor 
> > that exacerbates it unnecessarily.
> 
> Yea, so you still need to vaccum the large table regularly.
HOT covers the use-case of heavy updating, which in many common cases
occurs on tables with few inserts/deletes. HOT would significantly
reduce the need to vacuum since deletes and wraparound issues would be
the only remaining reasons to do this.
[I have some ideas for how to optimize tables with heavy INSERT/DELETE
activity, but that case is much less prevalent than heavy UPDATEs.]
> > I think the vision is that the overflow table would never be 
> > very large because it can be vacuumed very aggressively. It 
> > has only tuples that are busy and will need vacuuming as soon 
> > as a transaction ends. Unlike the main table which is mostly 
> > tuples that don't need vacuuming. 
> 
> Ok, but you have to provide an extra vacuum that does only that then
> (and it randomly touches heap pages, and only does partial work there).
Sure, HOT needs a specially optimised VACUUM.
> > So a heap that's double in size necessary takes twice as 
> > long as necessary to scan. The fact that the overflow tables 
> > are taking up space isn't interesting if they don't have to 
> > be scanned.
> 
> The overflow does have to be read for each seq scan. And it was stated
> that it would
> be accessed with random access (follow tuple chain).
> But maybe we can read the overflow same as if it where an additional
> segment file ?
Not without taking a write-avoiding lock on the table, unfortunately.
> > Hitting the overflow tables should be quite rare, it only 
> > comes into play when looking at concurrently updated tuples. 
> > It certainly happens but most tuples in the table will be 
> > committed and not being concurrently updated by anyone else.
> 
> The first update moves the row to overflow, only the 2nd next might be
> able to pull it back.
> So on average you would have at least 66% of all updated rows after last
> vacuum in the overflow.
> 
> The problem with needing very frequent vacuums is, that you might not be
> able to do any work because of long transactions.
HOT doesn't need more frequent VACUUMs, it is just more efficient and so
can allow them, when needed to avoid I/O. Space usage in the overflow
relation is at its worst in the case of an enormous table with low
volume random updates, but note that it is *never* worse than current
space usage. In the best case, which is actually fairly common in
practice: a small number of rows of a large table are being updated by a
steady stream of concurrent updates, we find the overflow relation needs
only a few 100 tuples, so regular vacuuming will be both easy and
effective.
As an aside, note that HOT works best in real-world situations, not
benchmarks such as TPC where the I/Os are deliberately randomised to
test the scalability of the RDBMS. But even then, HOT works better.
The long-running transaction issue remains unsolved in this proposal,
but I have some ideas for later.
-- 
  Simon Riggs             
  EnterpriseDB   http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2006-11-10 22:32:52 | Re: [HACKERS] Bug in WAL backup documentation | 
| Previous Message | Simon Riggs | 2006-11-10 20:39:43 | Re: Frequent Update Project: Design Overview ofHOTUpdates |