Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-08-31 13:58:38
Message-ID: CAFiTN-sfu1S3OaZiiUXXmK_Kz4ZuhwqTz-ULxiknzEdz_3S5iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 31, 2020 at 1:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Aug 31, 2020 at 10:49 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Sun, Aug 30, 2020 at 2:43 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> >
> > Another comment:
> >
> > +cleanup_rel_sync_cache(TransactionId xid, bool is_commit)
> > +{
> > + HASH_SEQ_STATUS hash_seq;
> > + RelationSyncEntry *entry;
> > +
> > + Assert(RelationSyncCache != NULL);
> > +
> > + hash_seq_init(&hash_seq, RelationSyncCache);
> > + while ((entry = hash_seq_search(&hash_seq)) != NULL)
> > + {
> > + if (is_commit)
> > + entry->schema_sent = true;
> >
> > How is it correct to set 'entry->schema_sent' for all the entries in
> > RelationSyncCache? Consider a case where due to invalidation in an
> > unrelated transaction we have set the flag schema_sent for a
> > particular relation 'r1' as 'false' and that transaction is executed
> > before the current streamed transaction for which we are performing
> > commit and called this function. It will set the flag for unrelated
> > entry in this case 'r1' which doesn't seem correct to me. Or, if this
> > is correct, it would be a good idea to write some comments about it.

Yeah, this is wrong, I have fixed this issue in the attached patch
and also added a new test for the same.

> Few more comments:
> 1.
> +my $appname = 'tap_sub';
> +$node_subscriber->safe_psql('postgres',
> +"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> application_name=$appname' PUBLICATION tap_pub"
> +);
>
> In most of the tests, we are using the above statement to create a
> subscription. Don't we need (streaming = 'on') parameter while
> creating a subscription? Is there a reason for not doing so in this
> patch itself?

I have changed this.

> 2.
> 009_stream_simple.pl
> +# Insert, update and delete enough rows to exceed the 64kB limit.
> +$node_publisher->safe_psql('postgres', q{
> +BEGIN;
> +INSERT INTO test_tab SELECT i, md5(i::text) FROM generate_series(3, 5000) s(i);
> +UPDATE test_tab SET b = md5(b) WHERE mod(a,2) = 0;
> +DELETE FROM test_tab WHERE mod(a,3) = 0;
> +COMMIT;
> +});
>
> How much above this data is 64kB limit? I just wanted to see that it
> should not be on borderline and then due to some alignment issues the
> streaming doesn't happen on some machines? Also, how such a test
> ensures that the streaming has happened because the way we are
> checking results, won't it be the same for the non-streaming case as
> well?

Only for this case, or you mean for all the tests?

> 3.
> +# Change the local values of the extra columns on the subscriber,
> +# update publisher, and check that subscriber retains the expected
> +# values
> +$node_subscriber->safe_psql('postgres', "UPDATE test_tab SET c =
> 'epoch'::timestamptz + 987654321 * interval '1s'");
> +$node_publisher->safe_psql('postgres', "UPDATE test_tab SET b = md5(a::text)");
> +
> +wait_for_caught_up($node_publisher, $appname);
> +
> +$result =
> + $node_subscriber->safe_psql('postgres', "SELECT count(*),
> count(extract(epoch from c) = 987654321), count(d = 999) FROM
> test_tab");
> +is($result, qq(3334|3334|3334), 'check extra columns contain locally
> changed data');
>
> Again, how this test is relevant to streaming mode?

I agree, it is not specific to the streaming.

> 4. I have checked that in one of the previous patches, we have a test
> v53-0004-Add-TAP-test-for-streaming-vs.-DDL which contains a test case
> quite similar to what we have in
> v55-0002-Add-support-for-streaming-to-built-in-logical-re/013_stream_subxact_ddl_abort.
> If there is any difference that can cover more scenarios then can we
> consider merging them into one test?

I will have a look.

> Apart from the above, I have made a few changes in the attached patch
> which are mainly to simplify the code at one place, added/edited few
> comments, some other cosmetic changes, and renamed the test case files
> as the initials of their name were matching other tests in the similar
> directory.

Changes look fine to me except this

+

+ /* the value must be on/off */
+ if (strcmp(strVal(defel->arg), "on") && strcmp(strVal(defel->arg), "off"))
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid streaming value")));
+
+ /* enable streaming if it's 'on' */
+ *enable_streaming = (strcmp(strVal(defel->arg), "on") == 0);

I mean for streaming why we need to handle differently than the other
surrounding code for example "binary" option.

Apart from that for testing 0001, I have added a new test in the
attached contrib.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
v57-0001-Fix-the-SharedFileSetUnregister-API.patch application/octet-stream 1.9 KB
v57-0002-Add-support-for-streaming-to-built-in-logical-re.patch application/octet-stream 101.9 KB
v2-0001-bufile_test.patch application/octet-stream 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message gkokolatos 2020-08-31 14:17:12 Re: New default role- 'pg_read_all_data'
Previous Message Konstantin Knizhnik 2020-08-31 13:53:44 Re: INSERT ON CONFLICT and RETURNING