Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-12-12 11:41:33
Message-ID: CAA4eK1KFPT6dsOenSfHORYcYvd1PVGxs_695kED=-xLfHGUw8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 11, 2019 at 11:46 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Dec 2, 2019 at 3:32 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > I have rebased the patch set on the latest head.
>
> 0001 looks like a clever approach, but are you sure it doesn't hurt
> performance when many small XLOG records are being inserted? I think
> XLogRecordAssemble() can get pretty hot in some workloads.
>

I don't think we have evaluated it yet, but we should do it. The
point to note is that it is only for the case when wal_level is
'logical' (see IsSubTransactionAssignmentPending) in which case we
already log more WAL, so this might not impact much. I guess that it
might be better to have that check in XLogRecordAssemble for the sake
of clarity.

>
> Regarding 0005, it seems to me that this is no good:
>
> + errmsg("improper heap_getnext call")));
>
> I think we should be using elog() rather than ereport() here, because
> this should only happen if there's a bug in a logical decoding plugin.
> At first, I thought maybe this should just be an Assert(), but since
> there are third-party logical decoding plugins available, checking
> this even in non-assert builds seems like a good idea. However, I
> think making it translatable is overkill; users should never see this,
> only developers.
>

makes sense. I think we should change it.

>
> + if (prev_lsn != InvalidXLogRecPtr)
> + Assert(prev_lsn <= change->lsn);
>
> There is no reason to ever write an if statement that contains only an
> Assert, and it's bad style. Write Assert(prev_lsn == InvalidXLogRecPtr
> || prev_lsn <= change->lsn), or better yet, use XLogRecPtrIsInvalid.
>

Agreed.

> The purpose and mechanism of the is_schema_sent flag is not clear to
> me. The word "schema" here seems to be being used to mean "snapshot,"
> which is rather confusing.
>

I have explained this flag below along with invalidations as both are
slightly related.

> I'm also somewhat unclear on what's happening here with invalidations.
> Perhaps that's as much a defect in my understanding as it is
> reflective of any problem with the patch, but I also don't see any
> comments either in 0002 or later patches explaining the theory of
> operation. If I've missed some, please point me in the right
> direction. Hypothetically speaking, it seems to me that if you just
> did InvalidateSystemCaches() every time the snapshot changed, you
> wouldn't need anything else (unless we're concerned with
> non-transactional invalidation messages like smgr and relmapper
> invalidations; not quite sure how those are handled). And, on the
> other hand, if we don't do InvalidateSystemCaches() every time the
> snapshot changes, then I don't understand why this works now, even
> without streaming.
>

I think the way invalidations work for logical replication is that
normally, we always start a new transaction before decoding each
commit which allows us to accept the invalidations (via
AtStart_Cache). However, if there are catalog changes within the
transaction being decoded, we need to reflect those before trying to
decode the WAL of operation which happened after that catalog change.
As we are not logging the WAL for each invalidation, we need to
execute all the invalidation messages for this transaction at each
catalog change. We are able to do that now as we decode the entire WAL
for a transaction only once we get the commit's WAL which contains all
the invalidation messages. So, we queue them up and execute them for
each catalog change which we identify by WAL record
XLOG_HEAP2_NEW_CID.

The second related concept is that before sending each change to
downstream (via pgoutput), we check whether we need to send the
schema. This we decide based on the local map entry
(RelationSyncEntry) which indicates whether the schema for the
relation is already sent or not. Once the schema of the relation is
sent, the entry for that relation in the map will indicate it. At the
time of invalidation processing we also blew up this map, so it always
reflects the correct state.

Now, to decode an in-progress transaction, we need to ensure that we
have received the WAL for all the invalidations before decoding the
WAL of action that happened immediately after that catalog change.
This is the reason we started WAL logging individual Invalidations.
So, with this change we don't need to execute all the invalidations
for each catalog change, rather execute them as and when their WAL is
being decoded.

The current mechanism to send schema changes won't work for streaming
transactions because after sending the change, subtransaction might
abort. On subtransaction abort, the downstream will simply discard
the changes where we will lose the previous schema change sent. There
is no such problem currently because we process all the aborts before
sending any change. So, the current idea of having a schema_sent flag
in each map entry (RelationSyncEntry) won't work for streaming
transactions. To solve this problem initially patch has kept a flag
'is_schema_sent' for each top-level transaction (in ReorderBufferTXN)
so that we can always send a schema for each (sub)transaction for
streaming transactions, but that won't work if we access multiple
relations in the same subtransaction. To solve this problem, we are
thinking of keeping a list/array of top-level xids in each
RelationSyncEntry. Basically, whenever we send the schema for any
transaction, we note that in RelationSyncEntry and at abort/commit
time we can remove xid from the list. Now, whenever, we check whether
to send schema for any operation in a transaction, we will check if
our xid is present in that list for a particular RelationSyncEntry and
take an action based on that (if xid is present, then we won't send
the schema, otherwise, send it). I think during decode, we should not
have that may open transactions, so the search in the array should be
cheap enough but we can consider some other data structure like hash
as well.

I will think some more and respond to your remaining comments/suggestions.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-12-12 11:46:21 Unicode normalization SQL functions
Previous Message Rahila Syed 2019-12-12 09:57:53 Re: Minimal logical decoding on standbys