Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-07-08 10:02:07
Message-ID: CAA4eK1+HjNdvLx04e9-K_5f_3gnMbEJbCo+RdzH-RT=7RaP=7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 8, 2020 at 9:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Sun, Jul 5, 2020 at 8:37 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > I have compared the changes logged at command end vs logged at commit
> > time. I have ignored the invalidation for the transaction which has
> > any aborted subtransaction in it. While testing this I found one
> > issue, the issue is that if there are some invalidation generated
> > between last command counter increment and the commit transaction then
> > those were not logged. I have fixed the issue by logging the pending
> > invalidation in RecordTransactionCommit. I will include the changes
> > in the next patch set.
> >
>
> I think it would have been better if you could have given examples for
> such cases where you need this extra logging. Anyway, below are few
> minor comments on this patch:
>
> 1.
> + /*
> + * Log any pending invalidations which are adding between the last
> + * command counter increment and the commit.
> + */
> + if (XLogLogicalInfoActive())
> + LogLogicalInvalidations();
>
> I think we can change this comment slightly and extend a bit to say
> for which kind of special cases we are adding this. "Log any pending
> invalidations which are added between the last CommandCounterIncrement
> and the commit. Normally for DDLs, we log this at each command end,
> however for certain cases where we directly update the system table
> the invalidations were not logged at command end."
>
> Something like above based on cases that are not covered by command
> end WAL logging.
>
> 2.
> + * Emit WAL for invalidations. This is currently only used for logging
> + * invalidations at the command end.
> + */
> +void
> +LogLogicalInvalidations()
>
> After this is getting used at a new place, it is better to modify the
> above comment to something like: "Emit WAL for invalidations. This is
> currently only used for logging invalidations at the command end or at
> commit time if any invalidations are pending."
>

I have done some more review and below are my comments:

Review-v31-0010-Provide-new-api-to-get-the-streaming-changes
----------------------------------------------------------------------------------------------
1.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1240,6 +1240,14 @@ LANGUAGE INTERNAL
VOLATILE ROWS 1000 COST 1000
AS 'pg_logical_slot_get_changes';

+CREATE OR REPLACE FUNCTION pg_logical_slot_get_streaming_changes(
+ IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int,
VARIADIC options text[] DEFAULT '{}',
+ OUT lsn pg_lsn, OUT xid xid, OUT data text)
+RETURNS SETOF RECORD
+LANGUAGE INTERNAL
+VOLATILE ROWS 1000 COST 1000
+AS 'pg_logical_slot_get_streaming_changes';

If we are going to add a new streaming API for get_changes, don't we
need for pg_logical_slot_get_binary_changes,
pg_logical_slot_peek_changes and pg_logical_slot_peek_binary_changes
as well? I was thinking why not add a new parameter (streaming
boolean) instead of adding the new APIs. This could be an optional
parameter which if user doesn't specify will be considered as false.
We already have optional parameters for APIs like
pg_create_logical_replication_slot.

2. You forgot to update sgml/func.sgml. This will be required even if
we decide to add a new parameter instead of a new API.

3.
+ /* If called has not asked for streaming changes then disable it. */
+ ctx->streaming &= streaming;

/If called/If the caller

4.
diff --git a/.gitignore b/.gitignore
index 794e35b..6083744 100644
--- a/.gitignore
+++ b/.gitignore
@@ -42,3 +42,4 @@ lib*.pc
/Debug/
/Release/
/tmp_install/
+/build/

Why the patch contains this change?

5. If I apply the first six patches and run the regressions, it fails
primarily because streaming got enabled by default. And then when I
applied this patch, the tests passed because it disables streaming by
default. I think this should be patch 0007.

Replication Origins
------------------------------
I think we also need to conclude on origins related discussion [1].
As far as I can see, the origin_id can be sent with the first startup
message. The origin_lsn and origin_commit can be sent with the last
start of streaming commit if we want but not sure if that is of use.
If we need to send it earlier then we need to record it with other WAL
records. The point is that those are set with
pg_replication_origin_xact_setup but not sure how and when that
function is called. The other alternative is that we can ignore that
for now and once the usage is clear we can enhance it. What do you
think?

[1] - https://www.postgresql.org/message-id/CAA4eK1JwXaCezFw%2BkZwoxbLKYD0nWpC2rPgx7vUsaDAc0AZaow%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-07-08 11:00:31 Re: Quick doc patch
Previous Message Magnus Hagander 2020-07-08 08:12:24 Re: Typo in pgstat.c