Re: Column Filtering in Logical Replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-18 14:43:41
Message-ID: 3162f7f0-86ac-e3a9-dc55-0a660cb9bab1@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/18/22 06:52, Amit Kapila wrote:
> On Fri, Mar 18, 2022 at 12:47 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> I pushed the second fix. Interestingly enough, wrasse failed in the
>> 013_partition test. I don't see how that could be caused by this
>> particular commit, though - see the pgsql-committers thread [1].
>>
>
> I have a theory about what's going on here. I think this is due to a
> test added in your previous commit c91f71b9dc. The newly added test
> added hangs in tablesync because there was no apply worker to set the
> state to SUBREL_STATE_CATCHUP which blocked tablesync workers from
> proceeding.
>
> See below logs from pogona [1].
> 2022-03-18 01:33:15.190 CET [2551176][client
> backend][3/74:0][013_partition.pl] LOG: statement: ALTER SUBSCRIPTION
> sub2 SET PUBLICATION pub_lower_level, pub_all
> 2022-03-18 01:33:15.354 CET [2551193][logical replication
> worker][4/57:0][] LOG: logical replication apply worker for
> subscription "sub2" has started
> 2022-03-18 01:33:15.605 CET [2551176][client
> backend][:0][013_partition.pl] LOG: disconnection: session time:
> 0:00:00.415 user=bf database=postgres host=[local]
> 2022-03-18 01:33:15.607 CET [2551209][logical replication
> worker][3/76:0][] LOG: logical replication table synchronization
> worker for subscription "sub2", table "tab4_1" has started
> 2022-03-18 01:33:15.609 CET [2551211][logical replication
> worker][5/11:0][] LOG: logical replication table synchronization
> worker for subscription "sub2", table "tab3" has started
> 2022-03-18 01:33:15.617 CET [2551193][logical replication
> worker][4/62:0][] LOG: logical replication apply worker for
> subscription "sub2" will restart because of a parameter change
>
> You will notice that the apply worker is never restarted after a
> parameter change. The reason was that the particular subscription
> reaches the limit of max_sync_workers_per_subscription after which we
> don't allow to restart the apply worker. I think you might want to
> increase the values of
> max_sync_workers_per_subscription/max_logical_replication_workers to
> make it work.
>

Hmmm. So the theory is that in most runs we manage to sync the tables
faster than starting the workers, so we don't hit the limit. But on some
machines the sync worker takes a bit longer, we hit the limit. Seems
possible, yes. Unfortunately we don't seem to log anything when we hit
the limit, so hard to say for sure :-( I suggest we add a WARNING
message to logicalrep_worker_launch or something. Not just because of
this test, it seems useful in general.

However, how come we don't retry the sync? Surely we don't just give up
forever, that'd be a pretty annoying behavior. Presumably we just end up
sleeping for a long time before restarting the sync worker, somewhere.

>> I'd like to test & polish the main patch over the weekend, and get it
>> committed early next week. Unless someone thinks it's definitely not
>> ready for that ...
>>
>
> I think it is in good shape but apart from cleanup, there are issues
> with dependency handling which I have analyzed and reported as one of
> the comments in the email [2]. I was getting some weird behavior
> during my testing due to that. Apart from that still the patch has DDL
> handling code in tablecmds.c which probably is not required.
> Similarly, Shi-San has reported an issue with replica full in her
> email [3]. It is up to you what to do here but it would be good if you
> can once share the patch after fixing these issues so that we can
> re-test/review it.

Ah, thanks for reminding me - it's hard to keep track of all the issues
in threads as long as this one.

BTW do you have any opinion on the SET COLUMNS syntax? Peter Smith
proposed to get rid of it in [1] but I'm not sure that's a good idea.
Because if we ditch it, then removing the column list would look like this:

ALTER PUBLICATION pub ALTER TABLE tab;

And if we happen to add other per-table options, this would become
pretty ambiguous.

Actually, do we even want to allow resetting column lists like this? We
don't allow this for row filters, so if you want to change a row filter
you have to re-add the table, right? So maybe we should just ditch ALTER
TABLE entirely.

regards

[4]
https://www.postgresql.org/message-id/CAHut%2BPtc7Rh187eQKrxdUmUNWyfxz7OkhYAX%3DAW411Qwxya0LQ%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-18 15:07:04 Re: pgsql: Add option to use ICU as global locale provider
Previous Message Pavel Borisov 2022-03-18 14:39:21 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)