Re: Design of pg_stat_subscription_workers vs pgstats

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Design of pg_stat_subscription_workers vs pgstats
Date: 2022-02-18 19:46:51
Message-ID: CAKFQuwZk-Trf78QW7-_q9Pr2EGBhtg=wSA2hRiOW-K845Zv2cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 18, 2022 at 1:26 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:

>
> Here is the summary of the discussion, changes, and plan.
>
> 1. Move some error information such as the error message to a new
> system catalog, pg_subscription_error. The pg_subscription_error table
> would have the following columns:
>
> * sesubid : subscription Oid.
> * serelid : relation Oid (NULL for apply worker).
> * seerrlsn : commit-LSN or the error transaction.
> * seerrcmd : command (INSERT, UPDATE, etc.) of the error transaction.
> * seerrmsg : error message
>

Not a fan of the "se" prefix but overall yes. We should include a timestamp.

> The tuple is inserted or updated when an apply worker or a tablesync
> worker raises an error. If the same error occurs in a row, the update
> is skipped.

I disagree with this - I would treat every new instance of an error to be
important and insert on conflict (sesubid, serelid) the new entry, updating
lsn/cmd/msg with the new values.

The tuple is removed in the following cases:
>
> * the subscription is dropped.
> * the table is dropped.

* the table is removed from the subscription.
> * the worker successfully committed a non-empty transaction.
>

Correct. This handles the "end of sync worker" just fine since its final
action should be a successful commit of a non-empty transaction.

>
> With this change, pg_stat_subscription_workers will be like:
>
> * subid
> * subname
> * subrelid
> * error_count
> * last_error_timestamp
>
> This view will be extended by adding transaction statistics proposed
> on another thread[1].
>

I haven't reviewed that thread but in-so-far as this one goes I would just
drop this altogether. The error count, if desired, can be added to
pg_subscription_error, and the timestamp should be added there as noted
above.

If this is useful for the feature on the other thread it can be
reconstituted from there.

> 2. Fix a bug in pg_stat_subscription_workers. As pointed out by
> Andres, there is a bug in pg_stat_subscription_workers; it doesn't
> drop entries for already-dropped tables. We need to fix it.
>

Given the above, this becomes an N/A.

> 3. Show commit-LSN of the error transaction in errcontext. Currently,
> we show XID and commit timestamp in the errcontext. But given that we
> use LSN in pg_subscriptoin_error, it's better to show commit-LSN as
> well (or instead of error-XID).
>

Agreed, I think: what "errcontext" is this referring to?

>
> 5. Record skipped data to the system catalog, say
> pg_subscription_conflict_history so that there is a chance to analyze
> and recover.

We can discuss the
> details in a new thread.
>
Agreed - the "before skipping" consideration seems considerably more
helpful; but wouldn't need to be persistent, it could just be a view. A
permanent record probably would best be recorded in the logs - though if we
get the pre-skip functionality the user can view directly and save the
results wherever they wish or we can provide a function to spool the
information to the log. I don't see persistent in-database storage being
that desirable here.

If we only do something after the transaction has been skipped it may be
useful to add an option to the skipping command to auto-disable the
subscription after the transaction has been skipped and before any
subsequent transactions are applied. This will give the user a chance to
process the post-skipped information before restoring sync and having the
system begin changing underneath them again.

>
> 4 and 5 might be better introduced together but I think since the user
> is able to check what changes will be skipped on the publisher side we
> can do 5 for v16.

How would one go about doing that (checking what changes will be skipped on
the publisher side)?

David J.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-18 19:56:13 Re: buildfarm warnings
Previous Message Joe Conway 2022-02-18 19:46:39 Re: Time to drop plpython2?