Re: 64 bit transaction id

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Павел Ерёмин <shnoor111gmail(at)yandex(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64 bit transaction id
Date: 2019-11-02 23:20:43
Message-ID: 20191102232043.ia6gu57uriucdtxd@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 02, 2019 at 11:35:09PM +0300, Павел Ерёмин wrote:
> The proposed option is not much different from what it is now.
> We are not trying to save some space - we will reuse the existing one. We
> just work in 64 bit transaction counters. Correct me if I'm wrong - the
> address of the next version of the line is stored in the 6 byte field
> t_cid in the tuple header - which is not attached to the current page in
> any way - and can be stored anywhere in the table. Nothing changes.

I think you mean t_ctid, not t_cid (which is a 4-byte CommandId, not any
sort of item pointer).

I think this comment from htup_details.h explains the issue:

* ... Beware however that VACUUM might
* erase the pointed-to (newer) tuple before erasing the pointing (older)
* tuple. Hence, when following a t_ctid link, it is necessary to check
* to see if the referenced slot is empty or contains an unrelated tuple.
* Check that the referenced tuple has XMIN equal to the referencing tuple's
* XMAX to verify that it is actually the descendant version and not an
* unrelated tuple stored into a slot recently freed by VACUUM. If either
* check fails, one may assume that there is no live descendant version.

Now, imagine you have a tuple that gets updated repeatedly (say, 3x) and
each version gets to a different page. Say, pages #1, #2, #3. And then
VACUUM happens on some of the "middle" page (this may happen when trying
to fit new row onto a page to allow HOT, but it might happen even during
regular VACUUM).

So we started with 3 tuples on pages #1, #2, #3, but now we have this

#1 - tuple exists, points to tuple on page #2
#2 - tuple no longer exists, cleaned up by vacuum
#3 - tuple exists

The scheme you proposed requires existence of all the tuples in the
chain to determine visibility. When tuple #2 no longer exists, it's
impossible to decide whether tuple on page #1 is visible or not.

This also significantly increases the amount of random I/O, pretty much
by factor of 2, because whenever you look at a row, you also have to
look at the "next version" which may be on another page. That's pretty
bad, bot for I/O and cache hit ratio. I don't think that's a reasonable
trade-off (at least compared to simply making the XIDs 64bit).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2019-11-02 23:22:54 Re: Allow cluster_name in log_line_prefix
Previous Message Dagfinn Ilmari Mannsåker 2019-11-02 23:14:47 [PATCH] contrib/seg: Fix PG_GETARG_SEG_P definition