Re: [HACKERS] HOT WIP Patch - version 2

From: Hannu Krosing <hannu(at)skype(dot)net>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] HOT WIP Patch - version 2
Date: 2007-02-20 07:48:56
Message-ID: 1171957736.3596.11.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Ühel kenal päeval, T, 2007-02-20 kell 12:08, kirjutas Pavan Deolasee:
>
> Reposting - looks like the message did not get through in the first
> attempt. My apologies if multiple copies are received.
>
>
> This is the next version of the HOT WIP patch. Since the last patch
> that
> I sent out, I have implemented the HOT-update chain pruning mechanism.
>
> When following a HOT-update chain from the index fetch, if we notice
> that
> the root tuple is dead and it is HOT-updated, we try to prune the
> chain to
> the smallest possible length. To do that, the share lock is upgraded
> to an
> exclusive lock and the tuple chain is followed till we find a
> live/recently-dead
> tuple. At that point, the root t_ctid is made point to that tuple. In
> order to
> preserve the xmax/xmin chain, the xmax of the root tuple is also
> updated
> to xmin of the found tuple. Since this xmax is also <
> RecentGlobalXmin
> and is a committed transaction, the visibility of the root tuple still
> remains
> the same.

What do you do, if there are no live tuples on the page ? will this
un-HOTify the root and free all other tuples in HOT chain ?

>
> The intermediate heap-only tuples are removed from the HOT-update
> chain.
> The HOT-updated status of these tuples is cleared and their respective
> t_ctid are made point to themselves. These tuples are not reachable
> now and ready for vacuuming.

Does this mean, that they are now indistinguishable from ordinary
tuples ?

Maybe they could be freed right away instead of changing HOT-updated
status and ctid ?

> This entire action is logged in a single
> WAL record.
>
> During vacuuming, we keep track of number of root tuples vacuumed.
> If this count is zero, then the index cleanup step is skipped. This
> would avoid unnecessary index scans whenever possible.
>
> This patch should apply cleanly on current CVS head and pass all
> regression
> tests. I am still looking for review comments from the first WIP
> patch. If anyone
> has already looked through it and is interested in the incremental
> changes,
> please let me know. I can post that.
>
> Whats Next ?
> -----------------
>
> ISTM that the basic HOT-updates and ability to prune the HOT-update
> chain,
> should help us reduce the index bloat, limit the overhead of ctid
> following in
> index fetch and efficiently vacuum heap-only tuples. IMO the next
> important
> but rather less troublesome thing to tackle is to reuse space within a
> block
> without complete vacuum of the table. This would help us do much more
> HOT-updates and thus further reduce index/heap bloat.
>
> I am thinking of reusing the DEAD heap-only tuples which gets removed
> from
> the HOT-update chain as part of pruning operation. Since these tuples,
> once
> removed from the chain, are neither reachable nor have any index
> references,
> could be readily used for storing newer versions of the same or other
> rows in
> the block. How about setting LP_DELETE on these tuples as part of the
> prune operation ? LP_DELETE is unused for heap tuples, if I am not
> mistaken. Other information like length and offset are is maintained
> as it is.

Seems like a good idea.

> When we run out space for update-within-the-block, we traverse
> through all the line pointers looking for LP_DELETEd items. If any of
> these
> items have space large enough to store the new tuple, that item is
> reused.
> Does anyone see any issue with doing this ? Also, any suggestions
> about doing it in a better way ?

IIRC the size is determined by the next tuple pointer, so you can store
new data without changing tuple pointer only if they are exactly the
same size.

> If the page gets really fragmented, we can try to grab a
> VACUUM-strength
> lock on the page and de-fragment it. The lock is tried conditionally
> to avoid
> any deadlocks. This is done in the heap_update() code path, so would
> add
> some overhead, but may still prove better than putting the tuple in a
> different block and having corresponding index insert(s). Also, since
> we are
> more concerned about the large tables, the chances of being able to
> upgrade
> the exclusive lock to vacuum-strength lock are high. Comments ?

I'm not sure about the "we are more concerned about the large tables"
part. I see it more as a device for high-update tables. This may not
always be the same as "large", so there should be some fallbacks for
case where you can't get the lock. Maybe just give up and move to
another page ?

> If there are no objections, I am planning to work on the first part
> while Nikhil would take up the second task of block level
> retail-vacuum.
> Your comments on these issues and the patch are really appreciated.
>
> Thanks,
> Pavan
>
> --
>
> EnterpriseDB http://www.enterprisedb.com
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Smet 2007-02-20 08:03:27 Re: WIP patch - INSERT-able log statements
Previous Message Tom Lane 2007-02-20 07:27:16 Re: ToDo: add documentation for operator IS OF

Browse pgsql-patches by date

  From Date Subject
Next Message Guillaume Smet 2007-02-20 08:03:27 Re: WIP patch - INSERT-able log statements
Previous Message Pavan Deolasee 2007-02-20 06:38:14 HOT WIP Patch - version 2