Re: Persist MVCC forever - retain history

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Mitar <mmitar(at)gmail(dot)com>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Persist MVCC forever - retain history
Date: 2020-07-03 02:51:51
Message-ID: D809DE2C-FA2A-4C31-AA54-1B0F5E974BC3@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Jul 2, 2020, at 5:58 PM, Mitar <mmitar(at)gmail(dot)com> wrote:
>
>> Plus, wrap-around and freezing aren’t just nice-to-have features.
>
> Oh, I forgot about that. ctid is still just 32 bits? So then for such
> table with permanent MVCC this would have to be increased, to like 64
> bits or something. Then one would not have to do wrap-around
> protection, no?

I think what you propose is a huge undertaking, and would likely result in a fork of postgres not compatible with the public sources. I do not recommend the project. But in answer to your question....

Yes, the values stored in the tuple header are 32 bits. Take a look in access/htup_details.h. You'll notice that HeapTupleHeaderData has a union:

union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;

If you check, HeapTupleFields and DatumTupleFields are the same size, each having three 32 bit values, though they mean different things. You may need to expand types TransactionId, CommandId, and Oid to 64 bits, expand varlena headers to 64 bits, and typemods to 64 bits. You may find that it is harder to just expand a subset of those, given the way these fields overlay in these unions. There will be lot of busy work going through the code to adjust everything else to match. Just updating printf style formatting in error messages may take a long time.

If you do choose to expand only some of the types, say just TransactionId and CommandId, you'll have to deal with the size mismatch between HeapTupleFields and DatumTupleFields.

Aborted transactions leave dead rows in your tables, and you may want to deal with that for performance reasons. Even if you don't intend to remove deleted rows, because you are just going to keep them around for time travel purposes, you might still want to use vacuum to remove dead rows, those that never committed.

You'll need to think about how to manage the growing clog if you don't intend to truncate it periodically. Or if you do intend to truncate clog periodically, you'll need to think about the fact that you have TransactionIds in your tables older than what clog knows about.

You may want to think about how keeping dead rows around affects index performance.

I expect these issues to be less than half what you would need to resolve, though much of the rest of it is less clear to me.


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2020-07-03 02:56:29 Re: estimation problems for DISTINCT ON with FDW
Previous Message Fujii Masao 2020-07-03 02:48:58 Re: track_planning causing performance regression