| From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)kurilemu(dot)de>, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, jian he <jian(dot)universality(at)gmail(dot)com>, Tomas Vondra <tomas(at)vondra(dot)me>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru> |
| Subject: | Re: Implement waiting for wal lsn replay: reloaded |
| Date: | 2026-01-07 08:06:23 |
| Message-ID: | CAPpHfdtw3_6JtR4S5E1J6RUirqZaJJRr1f63AKNKAjvVJMzffA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jan 7, 2026, 02:32 Andres Freund <andres(at)anarazel(dot)de> wrote:
> Hi,
>
> On 2026-01-06 18:42:59 +1300, Thomas Munro wrote:
> > Could this be causing the recent flapping failures on CI/macOS in
> > recovery/031_recovery_conflict? I didn't have time to dig personally
> > but f30848cb looks relevant:
> >
> > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 on
> primary
> > error running SQL: 'psql:<stdin>:1: ERROR: canceling statement due to
> > conflict with recovery
> > DETAIL: User was or might have been using tablespace that must be
> dropped.'
> > while running 'psql --no-psqlrc --no-align --tuples-only --quiet
> > --dbname port=25195
> > host=/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI
> > dbname='postgres' --file - --variable ON_ERROR_STOP=1' with sql 'WAIT
> > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s',
> > no_throw);' at
> /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm
> > line 2300.
> >
> > https://cirrus-ci.com/task/5771274900733952
> >
> > The master branch in time-descending order, macOS tasks only:
> >
> > task_id | substring | status
> > ------------------+-----------+-----------
> > 6460882231754752 | c970bdc0 | FAILED
> > 5771274900733952 | 6ca8506e | FAILED
> > 6217757068361728 | 63ed3bc7 | FAILED
> > 5980650261446656 | ae283736 | FAILED
> > 6585898394976256 | 5f13999a | COMPLETED
> > 4527474786172928 | 7f9acc9b | COMPLETED
> > 4826100842364928 | e8d4e94a | COMPLETED
> > 4540563027918848 | b9ee5f2d | FAILED
> > 6358528648019968 | c5af141c | FAILED
> > 5998005284765696 | e212a0f8 | COMPLETED
> > 6488580526178304 | b85d5dc0 | FAILED
> > 5034091344560128 | 7dc95cc3 | ABORTED
> > 5688692477526016 | bb048e31 | COMPLETED
> > 5481187977723904 | d351063e | COMPLETED
> > 5101831568752640 | f30848cb | COMPLETED <-- the change
> > 6395317408497664 | 3f33b63d | COMPLETED
> > 6741325208354816 | 877ae5db | COMPLETED
> > 4594007789010944 | de746e0d | COMPLETED
> > 6497208998035456 | 461b8cc9 | COMPLETED
>
> The failure rates of this are very high - the majority of the CI runs on
> the
> postgres/postgres repos failed since the change went in. Which then also
> means
> cfbot has a very high spurious failure rate. I think we need to revert this
> change until the problem has been verified as fixed.
>
This is fair. I will revert the commit causing the failures in the next few
hours.
------
Regards,
Alexander Korotkov
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Banck | 2026-01-07 08:07:18 | Re: [PATCH] Expose checkpoint reason to completion log messages. |
| Previous Message | Michael Paquier | 2026-01-07 08:02:02 | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |