Re: Design of pg_stat_subscription_workers vs pgstats

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-02-19 13:48:26
Message-ID: CAD21AoCOG-BOUor3HyYk9KWxfmaosLq8ANu9Nhk0vyS7Rfd4GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 19, 2022 at 4:47 AM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>
>>
>> Here is the summary of the discussion, changes, and plan.
>>
>> 1. Move some error information such as the error message to a new
>> system catalog, pg_subscription_error. The pg_subscription_error table
>> would have the following columns:
>>
>> * sesubid : subscription Oid.
>> * serelid : relation Oid (NULL for apply worker).
>> * seerrlsn : commit-LSN or the error transaction.
>> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
>> * seerrmsg : error message
>
>
> Not a fan of the "se" prefix but overall yes. We should include a timestamp.
>
>>
>> The tuple is inserted or updated when an apply worker or a tablesync
>> worker raises an error. If the same error occurs in a row, the update
>> is skipped.
>
>
> I disagree with this - I would treat every new instance of an error to be important and insert on conflict (sesubid, serelid) the new entry, updating lsn/cmd/msg with the new values.
>
>> The tuple is removed in the following cases:
>>
>> * the subscription is dropped.
>> * the table is dropped.
>>
>> * the table is removed from the subscription.
>> * the worker successfully committed a non-empty transaction.
>
>
> Correct. This handles the "end of sync worker" just fine since its final action should be a successful commit of a non-empty transaction.
>>
>>
>> With this change, pg_stat_subscription_workers will be like:
>>
>> * subid
>> * subname
>> * subrelid
>> * error_count
>> * last_error_timestamp
>>
>> This view will be extended by adding transaction statistics proposed
>> on another thread[1].
>
>
> I haven't reviewed that thread but in-so-far as this one goes I would just drop this altogether. The error count, if desired, can be added to pg_subscription_error, and the timestamp should be added there as noted above.
>
> If this is useful for the feature on the other thread it can be reconstituted from there.
>
>>
>> 2. Fix a bug in pg_stat_subscription_workers. As pointed out by
>> Andres, there is a bug in pg_stat_subscription_workers; it doesn't
>> drop entries for already-dropped tables. We need to fix it.
>
>
> Given the above, this becomes an N/A.
>
>>
>> 3. Show commit-LSN of the error transaction in errcontext. Currently,
>> we show XID and commit timestamp in the errcontext. But given that we
>> use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
>> well (or instead of error-XID).
>
>
> Agreed, I think: what "errcontext" is this referring to?

The CONTEXT part in a log message. The apply worker and the tablesync
worker write the details of the changes on an error in the CONTEXT
part as follow:

ERROR: duplicate key value violates unique constraint "test_pkey"
DETAIL: Key (id)=(1) already exists.
CONTEXT: processing remote data during "INSERT" for replication
target relation "public.test" in transaction 716 with commit timestamp
2021-09-29 15:52:45.165754+00

>>
>>
>> 5. Record skipped data to the system catalog, say
>> pg_subscription_conflict_history so that there is a chance to analyze
>> and recover.
>
>
>> We can discuss the
>> details in a new thread.
>
> Agreed - the "before skipping" consideration seems considerably more helpful; but wouldn't need to be persistent, it could just be a view. A permanent record probably would best be recorded in the logs - though if we get the pre-skip functionality the user can view directly and save the results wherever they wish or we can provide a function to spool the information to the log. I don't see persistent in-database storage being that desirable here.
>
> If we only do something after the transaction has been skipped it may be useful to add an option to the skipping command to auto-disable the subscription after the transaction has been skipped and before any subsequent transactions are applied. This will give the user a chance to process the post-skipped information before restoring sync and having the system begin changing underneath them again.
>
>>
>>
>> 4 and 5 might be better introduced together but I think since the user
>> is able to check what changes will be skipped on the publisher side we
>> can do 5 for v16.
>
>
> How would one go about doing that (checking what changes will be skipped on the publisher side)?

We can copy the replication slot while changing the plugin by using
pg_copy_replication_slot(). Then we can check the changes by using
pg_logical_slot_peek_changes() with the copied slot.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-02-19 14:10:39 Reducing power consumption on idle servers
Previous Message Andrew Dunstan 2022-02-19 13:48:14 Re: killing perl2host