Re: 64 bit transaction id

From: Павел Ерёмин <shnoor111gmail(at)yandex(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 64 bit transaction id
Date: 2019-11-04 13:39:44
Message-ID: 58099761572874784@myt5-bc0f9d8e5f27.qloud-c.yandex.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

<div>And yet, if I try to implement a similar mechanism, if successful, will my revision be considered?</div><div> </div><div><span style="background:#ffffff;color:#000000;float:none;font:400 15px 'arial' , sans-serif;text-decoration-style:initial;text-indent:0px;text-transform:none;white-space:pre-wrap;word-spacing:0px">regards</span></div><div><br /></div><div><br /></div><div>03.11.2019, 22:15, "Tomas Vondra" &lt;tomas(dot)vondra(at)2ndquadrant(dot)com&gt;:</div><blockquote><p>On Sun, Nov 03, 2019 at 02:17:15PM +0300, Павел Ерёмин wrote:<br /></p><blockquote class="b4fd5cf2ec92bc68cb898700bb81355fwmi-quote">   I completely agree with all of the above. Therefore, the proposed<br />   mechanism may entail larger improvements (and not only VACUUM).<br /></blockquote><p><br />I think the best think you can do is try implementing this ...<br /><br />I'm afraid the "improvements" essentially mean making various imporant<br />parts of the system much more complicated and expensive. There's a<br />trade-off between saving 8B per row and additional overhead (during<br />vacuum etc.), and it does not seem like a winning strategy. What started<br />as "we can simply look at the next row version" is clearly way more<br />complicated and expensive.<br /><br />The trouble here is that it adds dependency between pages in the data<br />file. That for example means that during cleanup of a page it may be <br />necessary to modify the other page, when originally that would be <br />read-only in that checkpoint interval. That's essentially write <br />amplification, and may significantly increase the amount of WAL due to <br />generating FPW for the other page.<br /><br /></p><blockquote class="b4fd5cf2ec92bc68cb898700bb81355fwmi-quote">   I can offer the following solution.<br />   For VACUUM, create a hash table.<br />   VACUUM scanning the table sees that the version (tuple1) has t_ctid filled<br />   and refers to the address tuple2, it creates a structure into which it<br />   writes the address tuple1, tuple1.xid, length tuple1 (well, and other<br />   information that is needed), puts this structure in the hash table by key<br />   tuple2 addresses.<br />   VACUUM reaches tuple2, checks the address of tuple2 in the hash table - if<br />   it finds it, it evaluates the connection between them and makes a decision<br />   on cleaning.<br /><br /></blockquote><p><br />We know VACUUM is already pretty expensive, so making it even more<br />expensive seems pretty awful. And the proposed solution seems damn<br />expensive. We already do something similar for indexes - we track<br />pointers for removed rows, so that we can remove them from indexes. And<br />it's damn expensive because we don't know where in the index the tuples<br />are - so we have to scan the whole indexes.<br /><br />This would mean we have to do the same thing for table, because we don't<br />know where in the table are the older versions of those rows, because we<br />don't know where the other rows are. That seems mighty expensive.<br /><br />Not to mention that this does nothing for page-level vacuum, which we<br />do when trying to fit another row on a page (e.g. for HOT). This has to<br />be absolutely cheap, we certainly are not going to do lookups of other<br />pages or looking for older versions of the row, and so on.<br /><br />Being able to do visibility decisions based on the tuple alone (or<br />possibly page-level + tuple information) has a lot of value, and I don't<br />think we want to make this more complicated.<br /><br />regards<br /><br /></p><span class="c18e9d485856a85513717a5a5b59d3fewmi-sign">-- <br />Tomas Vondra <a href="http://www.2ndquadrant.com/">http://www.2ndQuadrant.com</a><br />PostgreSQL Development, 24x7 Support, Remote DBA, Training &amp; Services <br /></span></blockquote>

Attachment Content-Type Size
unknown_filename text/html 3.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-11-04 14:28:30 Re: auxiliary processes in pg_stat_ssl
Previous Message Alvaro Herrera 2019-11-04 13:25:59 Re: auxiliary processes in pg_stat_ssl