Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-07-08 14:01:19
Message-ID: CAFPTHDY+fxB5R4rN1DCUZe39h5W=Ba9so9UbL+EncLLRFy4Zgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I was going through this thread and testing and reviewing the patches, I
think this is a great feature to have and one which customers would
appreciate. I wanted to help out, and I saw a request for a test patch for
a GUC to always enable streaming on logical replication. Here's one on top
of patchset v31, just in case you still need it. By default the GUC is
turned on, I ran the regression tests with it and didn't see any errors.

thanks,
Ajin
Fujitsu Australia

On Wed, Jul 8, 2020 at 8:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Wed, Jul 8, 2020 at 9:36 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> >
> > On Sun, Jul 5, 2020 at 8:37 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com>
> wrote:
> > >
> > > I have compared the changes logged at command end vs logged at commit
> > > time. I have ignored the invalidation for the transaction which has
> > > any aborted subtransaction in it. While testing this I found one
> > > issue, the issue is that if there are some invalidation generated
> > > between last command counter increment and the commit transaction then
> > > those were not logged. I have fixed the issue by logging the pending
> > > invalidation in RecordTransactionCommit. I will include the changes
> > > in the next patch set.
> > >
> >
> > I think it would have been better if you could have given examples for
> > such cases where you need this extra logging. Anyway, below are few
> > minor comments on this patch:
> >
> > 1.
> > + /*
> > + * Log any pending invalidations which are adding between the last
> > + * command counter increment and the commit.
> > + */
> > + if (XLogLogicalInfoActive())
> > + LogLogicalInvalidations();
> >
> > I think we can change this comment slightly and extend a bit to say
> > for which kind of special cases we are adding this. "Log any pending
> > invalidations which are added between the last CommandCounterIncrement
> > and the commit. Normally for DDLs, we log this at each command end,
> > however for certain cases where we directly update the system table
> > the invalidations were not logged at command end."
> >
> > Something like above based on cases that are not covered by command
> > end WAL logging.
> >
> > 2.
> > + * Emit WAL for invalidations. This is currently only used for logging
> > + * invalidations at the command end.
> > + */
> > +void
> > +LogLogicalInvalidations()
> >
> > After this is getting used at a new place, it is better to modify the
> > above comment to something like: "Emit WAL for invalidations. This is
> > currently only used for logging invalidations at the command end or at
> > commit time if any invalidations are pending."
> >
>
> I have done some more review and below are my comments:
>
> Review-v31-0010-Provide-new-api-to-get-the-streaming-changes
>
> ----------------------------------------------------------------------------------------------
> 1.
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1240,6 +1240,14 @@ LANGUAGE INTERNAL
> VOLATILE ROWS 1000 COST 1000
> AS 'pg_logical_slot_get_changes';
>
> +CREATE OR REPLACE FUNCTION pg_logical_slot_get_streaming_changes(
> + IN slot_name name, IN upto_lsn pg_lsn, IN upto_nchanges int,
> VARIADIC options text[] DEFAULT '{}',
> + OUT lsn pg_lsn, OUT xid xid, OUT data text)
> +RETURNS SETOF RECORD
> +LANGUAGE INTERNAL
> +VOLATILE ROWS 1000 COST 1000
> +AS 'pg_logical_slot_get_streaming_changes';
>
> If we are going to add a new streaming API for get_changes, don't we
> need for pg_logical_slot_get_binary_changes,
> pg_logical_slot_peek_changes and pg_logical_slot_peek_binary_changes
> as well? I was thinking why not add a new parameter (streaming
> boolean) instead of adding the new APIs. This could be an optional
> parameter which if user doesn't specify will be considered as false.
> We already have optional parameters for APIs like
> pg_create_logical_replication_slot.
>
> 2. You forgot to update sgml/func.sgml. This will be required even if
> we decide to add a new parameter instead of a new API.
>
> 3.
> + /* If called has not asked for streaming changes then disable it. */
> + ctx->streaming &= streaming;
>
> /If called/If the caller
>
> 4.
> diff --git a/.gitignore b/.gitignore
> index 794e35b..6083744 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -42,3 +42,4 @@ lib*.pc
> /Debug/
> /Release/
> /tmp_install/
> +/build/
>
> Why the patch contains this change?
>
> 5. If I apply the first six patches and run the regressions, it fails
> primarily because streaming got enabled by default. And then when I
> applied this patch, the tests passed because it disables streaming by
> default. I think this should be patch 0007.
>
> Replication Origins
> ------------------------------
> I think we also need to conclude on origins related discussion [1].
> As far as I can see, the origin_id can be sent with the first startup
> message. The origin_lsn and origin_commit can be sent with the last
> start of streaming commit if we want but not sure if that is of use.
> If we need to send it earlier then we need to record it with other WAL
> records. The point is that those are set with
> pg_replication_origin_xact_setup but not sure how and when that
> function is called. The other alternative is that we can ignore that
> for now and once the usage is clear we can enhance it. What do you
> think?
>
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1JwXaCezFw%2BkZwoxbLKYD0nWpC2rPgx7vUsaDAc0AZaow%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>
>

Attachment Content-Type Size
v31-0015-TEST-guc-always-streaming-logical.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-07-08 14:10:45 Re: [PATCH v2] Allow COPY "text" to output a header and add header matching mode to COPY FROM
Previous Message Stephen Frost 2020-07-08 14:00:37 Re: Default setting for enable_hashagg_disk