Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot
Date: 2018-10-23 01:43:38
Message-ID: 20181023014338.GA1658@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Oct 22, 2018 at 07:15:38PM -0300, Alvaro Herrera wrote:
> On 2018-Oct-22, Andres Freund wrote:
>> Hm? My point is that this fix just puts a band-aid onto *one* of the
>> places that read a XLOG_RUNNING_XACTS. Which still leaves the contents
>> of WAL record corrupted. There's not even a note at the WAL-record's
>> definition or its logging denoting that the contents are not what you'd
>> expect. I don't mean that the fix would break logical decoding, but
>> that it's possible that an equivalent of the problem affecting hot
>> standby also affects logical decoding. And even leaving those two users
>> aside, it's possible that there will be further vulernable internal
>> users or extensions parsing the WAL.
>
> Ah! I misinterpreted what you were saying. I agree we shouldn't let
> the WAL message have wrong data. Of course we shouldn't just add a
> code comment stating that the data is wrong :-)

Well, following the same kind of thoughts, txid_current_snapshot() uses
sort_snapshot() to remove all the duplicates after fetching its data
from GetSnapshotData(), so wouldn't we want to do something about
removal of duplicates if dummy PGXACT entries are found while scanning
the ProcArray also in this case? What I would think we should do is not
only to patch GetRunningTransactionData() but also GetSnapshotData() so
as we don't have duplicates also in this case, and do things in such a
way that both code paths use the same logic, and that we don't need to
have sort_snapshot() anymore. That would be more costly though...
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-10-23 02:00:56 Re: BUG #15448: server process (PID 22656) was terminated by exception 0xC0000005
Previous Message Amit Langote 2018-10-23 00:28:53 Re: relhassubclass and partitioned indexes

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Eisentraut 2018-10-23 12:30:27 pgsql: Sprinkle some const decorations
Previous Message Alvaro Herrera 2018-10-22 22:15:38 Re: pgsql: Avoid duplicate XIDs at recovery when building initial snapshot