Re: Detecting File Damage & Inconsistencies

From: Craig Ringer <craig(dot)ringer(at)enterprisedb(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, cleyssondba(at)gmail(dot)com, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Detecting File Damage & Inconsistencies
Date: 2021-06-22 05:31:34
Message-ID: CAGRY4nzFuco-0od26hoUxKC_N7CzLHJeMX0tUbyqfX1LRqVA=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 22 Jun 2021 at 00:24, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
wrote:

> On Thu, Mar 18, 2021 at 6:20 AM Craig Ringer
> <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> >
> > On Mon, 15 Mar 2021 at 21:01, David Steele <david(at)pgmasters(dot)net> wrote:
> >>
> >> On 11/18/20 5:23 AM, Simon Riggs wrote:
> >> > On Wed, 18 Nov 2020 at 06:42, Craig Ringer
> >> > <craig(dot)ringer(at)enterprisedb(dot)com> wrote:
> >> >>
> >> >> On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon(at)2ndquadrant(dot)com>
> wrote:
> >> >>>
> >> >>>
> >> >>> What I'm proposing is an option to add 16 bytes onto each COMMIT
> >> >>> record
> >> >>
> >> >>
> >> >> Would it make sense to write this at the time we write a topxid
> assignment to WAL instead?
> >> >>
> >> >> Otherwise it won't be accessible to streaming-mode logical decoding.
> >> >
> >> > Do you mean extend the xl_xact_assignment record? My understanding is
> >> > that is not sent in all cases, so not sure what you mean by "instead".
> >>
> >> Craig, can you clarify?
> >
> >
> > Right. Or write a separate WAL record when the feature is enabled. But
> it's probably sufficient to write it as an optional chunk on
> xl_xact_assignment records. We often defer writing them so we can optimise
> away xacts that never actually wrote anything, but IIRC we still write one
> before we write any WAL that references the xid. That'd be fine, since we
> don't need the info any sooner than that during decoding. I'd have to
> double check that we write it in all cases and won't get to that too soon,
> but I'm pretty sure we do...
>
> The commit record is optimized away if no xid is assigned, though is
> still present if we didn't write any WAL records.
>
> But if a commit record exists in the WAL stream, we want to know where
> it came from.
>
> A later patch will add PITR capability based on this information so
> attaching it directly to the commit record is fairly important, IMHO.
>

Why?

All the proposed info:

* 8-byte session start time (from MyStartTime)
* 2-byte pid (from MyProcPid)
* 4-byte user oid

are available at topxid assignment time. If we defer writing them until
commit, we lose the ability to use this information during streaming
logical decoding. That's something I believe you've wanted for other
functionality in the past, such as logical decoding based audit
functionality.

IIRC the restart_lsn horizon already ensures that we can't miss the
xl_xact_assignment at the start of a txn. We would ensure that the desired
info is available throughout decoding of the txn, including at commit
record processing time, by adding it to the toplevel ReorderBufferTxn.

The only advantage I can see to annotating the commit record instead is
that we don't have to spend a few bytes per reorder-buffered topxid to
track this info between start of decoding for the tx and processing of the
commit record. I don't think that's worth caring about.The advantages that
having it earlier would give us are much more significant.

A few examples:

* Skip reorder buffering of non-target transactions early, so we can decode
the WAL stream to find the target transactions much faster using less
memory and I/O;

* Read the database change stream and use the session info to stream info
into an intrusion detection system and/or audit engine in real time, using
txn streaming to avoid the need to create huge reorder buffers;

* Re-decode the WAL stream to identify a target txn you know was aborted,
and commit it instead, so you can recover data from aborted txns from the
WAL stream using logical decoding. (Only possible if the catalog_xmin
hasn't advanced past that point already though)

So yeah. I think it'd be better to log the info you want at start-of-txn
unless there's a compelling reason not so, and I don't see one yet.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-06-22 05:35:22 Re: Toast compression method options
Previous Message Amit Kapila 2021-06-22 05:06:57 Re: Doc chapter for Hash Indexes