Re: UNDO and in-place update

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UNDO and in-place update
Date: 2017-01-10 13:58:12
Message-ID: CAA4eK1L1roLdryjf_0dVD1GOR-O5zRsuY0MgPG9y9fiomYWbEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 9, 2017 at 11:47 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jan 9, 2017 at 7:50 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> One idea could be that we have some fixed number of
>> slots (i think we can make it variable as well, but for simplicity,
>> lets consider it as fixed) in the page header which will store the
>> offset to the transaction id inside a TPD entry of the page. Consider
>> a TPD entry of page contains four transactions, so we will just store
>> enough information in heap page header to reach the transaction id for
>> these four transactions. I think each such page header slot could be
>> three or four bits long depending upon how many concurrent
>> transactions we want to support on a page after which a new
>> transaction has to wait (I think in most workloads supporting
>> simultaneous eight transactions on a page should be sufficient).
>> Then we can have an additional byte (or less than byte) in the tuple
>> header to store lock info which is nothing but an offset to the slot
>> in the page header. We might find some other locking technique as
>> well, but I think keeping it same as current has benefit.
>
> Yes, something like this can be done. You don't really need any new
> page-level header data, because you can get the XIDs from the TPD
> entry (or from the page itself if there's only one). But you could
> expand the single "is-modified" bit that I've proposed adding to each
> tuple to multiple bits. 0 means not recently modified. 1 means
> modified by the first or only transaction that has recently modified
> the page. 2 means modified by the second transaction that has
> recently modified the page. Etc.
>

makes sense.

> What I was thinking about doing instead is storing an array in the TPD
> containing the same information. There would be one byte or one half
> a byte or whatever per TID and it would contain the index of the XID
> in the TPD that had most recently modified or locked that TID. Your
> solution might be better, though, at least for cases where the number
> of tuples that have modified the page is small.
>

I think we also need to prevent multiple backends trying to reserve a
slot in this array which can be a point of contention. Another point
is during pruning, if due to row movement TIDs are changed, we need to
keep this array in sync.

> However, I'm not
> totally sure. I think it's important to keep the tuple headers VERY
> small, like 3 bytes. Or 2 bytes. Or maybe even variable size but
> only 1 byte in common cases. So I expect bit space in those places to
> be fairly scarce and precious.
>

I agree that we should carefully choose the format so as to keep a
trade-off between performance and space savings.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-01-10 14:06:28 Re: Logical Replication WIP
Previous Message Peter Eisentraut 2017-01-10 13:52:44 Re: Logical Replication WIP