From: | Jan Wieck <jan(at)wi3ck(dot)info> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
Subject: | Re: Commit Timestamp and LSN Inversion issue |
Date: | 2024-11-12 14:45:07 |
Message-ID: | 3dd6f01e-918b-4bea-97dd-65cee886b9b8@wi3ck.info |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11/11/24 23:22, Amit Kapila wrote:
> On Mon, Nov 11, 2024 at 9:05 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> Alternatively, we could simply stop relying on the timestamps recorded
>> in the commit, and instead derive "monotonic" commit timestamps after
>> the fact. For example, we could track timestamps for some subset of
>> commits, and then approximate the rest using LSN.
>>
>> AFAIK the inversion can happen only for concurrent commits, and there's
>> can't be that many of those. So if we record the (LSN, timestamp) for
>> every 1MB of WAL, we approximate timestamps for commits in that 1MB by
>> linear approximation.
>>
>> Of course, there's a lot of questions and details to solve - e.g. how
>> often would it need to happen, when exactly would it happen, etc. And
>> also how would that integrate with the logical decoding - it's easy to
>> just get the timestamp from the WAL record, this would require more work
>> to actually calculate it. It's only a very rough idea.
>>
>
> I think for logical decoding it would be probably easy because it
> reads all the WAL. So, it can remember the commit time of the previous
> commit, and if any future commit has a commit timestamp lower than
> that it can fix it by incrementing it. But outside logical decoding,
> it would be tricky because we may need a separate process to fix up
> commit timestamps by using linear approximation. IIUC, the bigger
> challenge is that such a solution would require us to read the WAL on
> a continuous basis and keep fixing commit timestamps or we need to
> read the extra WAL before using or relying on commit timestamp. This
> sounds to be a somewhat complex and costlier solution though the cost
> is outside the hot-code path but still, it matters as it leads to
> extra read I/O.
>
I had originally experimented with adjusting the timestamps sent to the
subscriber in the output plugin. This works for the most part but has
two major flaws:
1) In case the system is restarted in a situation where it has some
forward clock skew and the restart LSN is right in the middle of it, the
output plugin has no knowledge of that and again emits backwards running
timestamps.
2) The origin and the subscriber now have a different timestamp
associated with the same transaction. That can lead to different
behavior in conflict resolution when a third node gets involved.
Best Regards, Jan
From | Date | Subject | |
---|---|---|---|
Next Message | Benoit Lobréau | 2024-11-12 14:56:11 | Re: Parallel workers stats in pg_stat_database |
Previous Message | David E. Wheeler | 2024-11-12 14:44:52 | Re: RFC: Additional Directory for Extensions |