Re: Frequent Update Project: Design Overview of HOT Updates

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Subject: Re: Frequent Update Project: Design Overview of HOT Updates
Date: 2006-11-10 06:23:28
Message-ID: 2e78013d0611092223n15180ccdkd0f00c20c71374e7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/10/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > On 11/10/06, Josh Berkus < josh(at)agliodbs(dot)com> wrote:
> >> I believe that's the "unsolved technical issue" in the prototype,
> unless
> >> Pavan has solved it in the last two weeks. Pavan?
> >>
> > When an overflow tuple is copied back to the main heap, the overflow
> tuple
> > is
> > marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
> > when a backend tries to lock the overflow version of the tuple, it
> checks
> > the flag
> > and jumps to the main heap if the flag is set.
>
> (1) How does it "jump to the main heap"? The links go the other
> direction.

The overflow tuple has a special header to store the back pointer to the
main heap.
This increases the tuple header size by 6 bytes, but the overhead is
restricted only to the overflow
tuples.

(2) Isn't this full of race conditions?

I agree, there could be race conditions. But IMO we can handle those. When
we
follow the tuple chain, we hold a SHARE lock on the main heap buffer. Also,
when
the root tuple is vacuumable and needs to be overwritten, we acquire and
keep EXCLUSIVE
lock on the main heap buffer.

This reduces the race conditions to a great extent.

(3) I thought you already used up the one remaining t_infomask bit.

Yes. The last bit in the t_infomask is used up to mark presence of overflow
tuple header. But I believe there are few more bits that can be reused.
There are three bits available in the t_ctid field as well (since ip_posid
needs maximum 13 bits). One bit is used to identify whether a given tid
points to the main heap or the overflow heap. This helps when tids are
passed around in the code.

Since the back pointer from the overflow tuple always points to the main
heap, the same bit can be used to mark copied-back tuples (we are doing it
in a slight different way in the current prototype though).

Regards,
Pavan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2006-11-10 07:06:35 Re: Frequent Update Project: Design Overview of HOT
Previous Message Tom Lane 2006-11-10 05:27:48 Re: Frequent Update Project: Design Overview of HOT Updates