From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shawn McCoy <shawn(dot)the(dot)mccoy(at)gmail(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>, "drewwcallahan(at)gmail(dot)com" <drewwcallahan(at)gmail(dot)com>, "scott(at)meads(dot)us" <scott(at)meads(dot)us> |
Subject: | RE: Disabled logical replication origin session causes primary key errors |
Date: | 2025-04-23 01:41:20 |
Message-ID: | OSCPR01MB149668436272E7ADE05202A1AF5BA2@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Dear Sawada-san,
Thanks for giving comments!
> +# The bug was that the replication origin wasn’t updated whe
> +# apply_error_callback() was called with elevel >= ERROR, and the apply
> worker
> +# continued running afterward.
>
> I think it would be better to mention the fact that the problem
> happened when an error was caught for instance by a plpgsql function.
> How about rewriting it as follows?
>
> # The bug was that when an ERROR was caught, for instance by a
> PL/pgSQL function,
> # the apply worker reset the replication origin but continued processing
> # subsequent changes. This behavior resulted in a failure to update
> the replication
> # origin during further apply operations.
I tried to describe the internal reasons of bugs, but yours did reported facts.
+1, replaced.
> ---
> +# Define an after-trigger function for the table insert. It can be fired even
> +# by the apply worker and always raises an exception. This situation allows
> +# worker continue after apply_error_callback() is called with elevel = ERROR.
> +$node_subscriber->safe_psql(
> + 'postgres', q{
> +CREATE FUNCTION handle_exception_trigger()
> +RETURNS TRIGGER AS $$
> +BEGIN
> + BEGIN
> + -- Raise an exception
> + RAISE EXCEPTION 'This is a test exception';
> + EXCEPTION
> + WHEN OTHERS THEN
> + RETURN NEW;
> + END;
> +
> + RETURN NEW;
> +END;
> +$$ LANGUAGE plpgsql;
> +});
> +
> +$node_subscriber->safe_psql(
> + 'postgres', q{
> +CREATE TRIGGER silent_exception_trigger
> +AFTER INSERT OR UPDATE ON t1
> +FOR EACH ROW
> +EXECUTE FUNCTION handle_exception_trigger();
> +
> +ALTER TABLE t1 ENABLE ALWAYS TRIGGER silent_exception_trigger;
> +});
>
> How about rewriting the comment as follows?
>
> # Create an AFTER INSERT trigger on the table that raises and subsequently
> # handles an exception. Subsequent insertions will trigger this exception,
> # causing the apply worker to invoke its error callback with an ERROR. However,
> # since the error is caught within the trigger, the apply worker will continue
> # processing changes.
Fixed.
> And can we execute these queries in one safe_psql() call?
Yes possible. I intentionally separated to make it clearer, but I did not have
strong opinions. Fixed.
> ---
> +# Obtain current remote_lsn value to check its advancement later
> +my $remote_lsn = $node_subscriber->safe_psql('postgres',
> + "SELECT remote_lsn FROM
> pg_catalog.pg_replication_origin_status os, pg_catalog.pg_subscription
> s WHERE os.external_id = 'pg_' || s.oid AND s.subname = 'regress_sub'"
> +);
>
> It seems to make sense to me to get the remote_lsn value just before
> executing INSERT after creating the trigger.
Moved.
> Is it a conventional way to always use schema-qualified catalogs names
> in regression tests? Looking at other tests in src/test/subscription,
> there are only three cases:
>
> % git grep pg_catalog src/test/subscription/t
> src/test/subscription/t/001_rep_changes.pl: FROM
> pg_catalog.pg_stat_io
> src/test/subscription/t/020_messages.pl: "SELECT COUNT(*) FROM
> pg_catalog.pg_replication_slots WHERE slot_name = 'tap_sub' AND
> active='f'",
> src/test/subscription/t/029_on_error.pl: "SELECT subenabled = false
> FROM pg_catalog.pg_subscription WHERE subname = 'sub'"
>
> ISTM that we don't necessarily need to make the catalog name schema-qualified.
I referred parts of Cluster.pm and 040_standby_failover_slots_sync.pl, and they
had "pg_catalog" prefix. After considering more, instances would be created within
the test and we ensure to connect to them - we can assume they are safe, OK to
remove it.
> ---
> We might want to stop both the publisher and the subscriber at the end
> of the tests.
Opps, added.
Attached patch could pass tests on ,my env, and pgperltidy said OK.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
Attachment | Content-Type | Size |
---|---|---|
v6-PG16-PG17-0001-Fix-oversight-3f28b2f.patch | application/octet-stream | 5.4 KB |
v6-HEAD-0001-Fix-oversight-3f28b2f.patch | application/octet-stream | 6.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Euler Taveira | 2025-04-23 01:57:53 | Re: BUG #18897: Logical replication conflict after using pg_createsubscriber under heavy load |
Previous Message | Tom Lane | 2025-04-22 20:53:23 | Re: Command order bug in pg_dump |